如何使用AND运算符过滤包含特定字符串值的行

Question

如何使用AND运算符过滤包含特定字符串值的行

6

我的问题是上面这个链接中很好回答的问题的扩展：

如何从 Pandas 数据帧中筛选包含字符串模式的行

我在下面发布了答案，其中当字符串包含单词 "ball" 时过滤掉它们：

In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
     ids     vals
0  aball     1
1  bball     2
3  fball     4

我的问题是：如果我的数据中有很长的句子，并且我想要识别包含“球”和“场地”这两个词的字符串，那该怎么办呢？这样可以丢弃只包含一个单词“球”或“场地”的数据，但保留同时包含这两个单词的内容。

- Mars

2

顺便提一下，如果要搜索固定字符串（即非正则表达式），您通常可以使用 df['ids'].str.contains("ball", regex=False) 来获得一些速度提升。 - Alex Riley

4个回答

2

如果您有两个以上的，可以使用这个方法...(请注意速度不如foxyblue的方法快)

l = ['ball', 'field']
df.ids.apply(lambda x: all(y in x for y in l))

- BENY

0

你可以使用 np.logical_and.reduce，而且 str.contains 可以处理多个单词。 df[np.logical_and.reduce([df['ids'].str.contains(w) for w in ['ball', 'field']])]

In [96]: df
Out[96]:
             ids
0  ball is field
1     ball is wa
2  doll is field

In [97]: df[np.logical_and.reduce([df['ids'].str.contains(w) for w in ['ball', 'field']])]
Out[97]:
             ids
0  ball is field

- Zero

0

另一种正则表达式方法：

In [409]: df
Out[409]:
               ids
0   ball and field
1  ball, just ball
2      field alone
3  field and ball

In [410]: pat = r'(?:ball.*field|field.*ball)'

In [411]: df[df['ids'].str.contains(pat)]
Out[411]:
               ids
0   ball and field
3  field and ball

- MaxU - stand with Ukraine

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- foxyblue · Accepted Answer

df[df['ids'].str.contains("ball")]

Would become:

df[df['ids'].str.contains("ball") & df['ids'].str.contains("field")]

如果你喜欢更整洁的代码：

contains_balls = df['ids'].str.contains("ball")
contains_fields = df['ids'].str.contains("field")

filtered_df = df[contains_balls & contains_fields]