如何根据列值的长度过滤数据框行

Question

如何根据列值的长度过滤数据框行

3

我有一个数据框，其中一列包含以下字符串：

df=pd.DataFrame(['Hello world', 'World is good', 'Worldisnice hello'], columns=['A'])

df
                     A
0         'Hello world'
1       'World is good'
2   'Worldisnice hello'

我正在尝试获取包含11个字符长度的单词的行。

我正在使用以下代码，但它给出的是字符串的长度而不是列中的单词。

df = df[df['A'].apply(lambda x: len(x) == 11)]

得到以下结果：

df
                     A
0         'Hello world'

输出应为：

df
                     A
0   'Worldisnice hello'

由于这个单词中唯一的一个长度为11个字符，因此它是独一无二的。

谢谢你。

- Nacho

3个回答

1

另一种方法：

df[df.A.str.split().map(lambda x: any(len(y) == 11 for y in x))]

提供以下内容：

                   A
2  Worldisnice hello

- PieCot

不确定这真的有什么好处..而且速度慢了大约4倍 :-) - Danail Petrov

你确定在有相当数量的行时它会慢4倍吗？ - PieCot

这就是%timeit所说的。 - Danail Petrov

1

我喜欢明确定义简单的过滤函数。我认为这样更易读和易于维护。

In [8]: def f(row):
   ...:     words = row.A.split()
   ...:     for w in words:
   ...:         if len(w) == 11:
   ...:             return True
   ...: 

In [9]: df.loc[df.apply(f, axis=1) == True]
Out[9]: 
                   A
2  Worldisnice hello

- alec_djinn

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Danail Petrov · Accepted Answer

len(x) 在您的代码中检查整个字符串的长度。

>>> df.A.str.len()
 0    11
 1    13
 2    17

你需要做的是将字符串分割成单词，并检查任何一个单词的长度是否等于11。以下代码可以完成这项任务。

>>> df[df['A'].apply(lambda x: any(len(y) == 11 for y in x.split()))]
                  A
2  Worldisnice hello