Pandas str.contains，包含所有给定字符

Question

Pandas str.contains，包含所有给定字符

3

使用str.contain可以搜索包含所有给定字符的字符串吗？

是可以的：

df["col1"].str.contains("A")

如果我想找到至少一个给定的字符，这个也可以使用：

df["col1"].str.contains("A|B")

然而，如果我想要找到包含所有给定字符的字符串，这种方法就不起作用了。

df["col1"].str.contains("A&B")

它的结果是全假的。有什么建议吗？谢谢！

- jjjayn

3个回答

4

另一种方法：

df['col1'].apply(set('AB').issubset)

以下是一些示例时间：

import pandas as pd
import numpy as np

strings = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', 'CABA', 'dog', 'cat'])
%timeit strings.apply(set('AB').issubset)
# 10000 loops, best of 3: 102 µs per loop
%timeit strings.str.contains('A.*B|B.*A')
# 10000 loops, best of 3: 149 µs per loop
%timeit strings.str.contains('A') & strings.str.contains('B')
# 1000 loops, best of 3: 712 µs per loop

- Jon Clements

1

做得好！在这里的有趣之处之一就是学习其他（更好的）解决问题的方法。 - meloncholy

0

如果您正在寻找一个大型（或最初未知）字符集，可以稍微更一般化地完成此操作。

DataFrame({key: df.col1.str.contains(key) for key in 'AB'}).all(axis=1)

可能有更好的方法来做这件事（通常在pandas中是这样的:), 但它给了我与@benzad.nouri在一个5mm行DF上的答案相当的性能。

- meloncholy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- behzad.nouri · Accepted Answer

either

df['col1'].str.contains('A.*B|B.*A')

或者

df['col1'].str.contains('A') & df['col1'].str.contains('B')

例子：

>>> df
      col1
0  wAxyzBw
1  wBxyzAw
2    wAxyz
3    wBxyz
>>> df['col1'].str.contains('A.*B|B.*A')
0     True
1     True
2    False
3    False
Name: col1, dtype: bool
>>> df['col1'].str.contains('A') & df['col1'].str.contains('B')
0     True
1     True
2    False
3    False
Name: col1, dtype: bool