根据列值长度过滤数据框行

Question

根据列值长度过滤数据框行

17

我有一个如下的pandas数据框：

df = pd.DataFrame([ [1,2], [np.NaN,1], ['test string1', 5]], columns=['A','B'] )

df
              A  B
0             1  2
1           NaN  1
2  test string1  5

我正在使用 pandas 0.20。最有效的方法是什么，以删除任何列值长度> 10的行？所以对于上面的例子，我期望得到以下输出：

df
              A  B
0             1  2
1           NaN  1

- D.prd

4个回答

9

我需要将其转换为字符串才能使用Diego的答案：

```

我需要将其转换为字符串才能使用Diego的答案：

```

df = df[df['A'].apply(lambda x: len(str(x)) <= 10)]

- Elizabeth

3

In [42]: df
Out[42]:
              A  B                         C          D
0             1  2                         2 2017-01-01
1           NaN  1                       NaN 2017-01-02
2  test string1  5  test string1test string1 2017-01-03

In [43]: df.dtypes
Out[43]:
A            object
B             int64
C            object
D    datetime64[ns]
dtype: object

In [44]: df.loc[~df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(1)]
Out[44]:
     A  B    C          D
0    1  2    2 2017-01-01
1  NaN  1  NaN 2017-01-02

解释:

df.select_dtypes(['object']) 仅选择 object (str) 数据类型的列：

In [45]: df.select_dtypes(['object'])
Out[45]:
              A                         C
0             1                         2
1           NaN                       NaN
2  test string1  test string1test string1

In [46]: df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10))
Out[46]:
       A      C
0  False  False
1  False  False
2   True   True

现在我们可以按照以下方式"聚合"它：

In [47]: df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(axis=1)
Out[47]:
0    False
1    False
2     True
dtype: bool

最后我们可以选择只有值为False的行：

In [48]: df.loc[~df.select_dtypes(['object']).apply(lambda x: x.str.len().gt(10)).any(axis=1)]
Out[48]:
     A  B    C          D
0    1  2    2 2017-01-01
1  NaN  1  NaN 2017-01-02

- MaxU - stand with Ukraine

df[df.select_dtypes([object]).astype(str).applymap(len).le(10).all(1)] - piRSquared

@piRSquared，感谢你的提示！你不觉得.applymap()会更慢吗？ - MaxU - stand with Ukraine

我不确定。它可能在最近的版本中得到了改进。我大多数时间只是打高尔夫球。 - piRSquared

1

0.19.2 /惊讶！你被曝光了 :-) - piRSquared

1

@piRSquared，不是的，它们的长度是NaN。我的意思是，对于NaN值，df[col].str.len()返回NaN。 - MaxU - stand with Ukraine

显示剩余2条评论

2

使用Series的apply函数来保留它们： df = df[df['A'].apply(lambda x: len(x) <= 10)]

该代码片段将筛选出'A'列长度小于等于10的行。

- Diego Aguado

谢谢大家！我选择了适用于所有列的解决方案。 - D.prd

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Zero · Accepted Answer

如果基于列A

In [865]: df[~(df.A.str.len() > 10)]
Out[865]:
     A  B
0    1  2
1  NaN  1

如果基于所有列。

In [866]: df[~df.applymap(lambda x: len(str(x)) > 10).any(axis=1)]
Out[866]:
     A  B
0    1  2
1  NaN  1