使用列表中的任意值来过滤pandas数据框。

Question

使用列表中的任意值来过滤pandas数据框。

3

I have a pandas dataframe:

df
0       PL
1       PL
2       PL
3       IT
4       IT
        ..
4670    DE
4671    NO
4672    MT
4673    FI
4674    XX
Name: country_code, Length: 4675, dtype: object

我正在通过德国国家标签“DE”进行过滤，方法如下：

df = df[df.apply(lambda x: 'DE' in x)]

如果我想要筛选的国家不止一个，就必须通过以下方式手动添加: .apply(lambda x: 'DE' in x or 'GB' in x)。然而，我想创建一个国家列表并自动生成这个语句。

类似下面这样:

countries = ['DE', 'GB', 'IT']
df = df[df.apply(lambda x: any_item_in_countries_list in x)]

我想我可以通过三次筛选 df 并通过concat()将这些块合并，但是否有更通用的函数来实现这一点？

- oakca

2个回答

1

如果你有列名，你可以尝试这个：

countries = ['DE', 'GB', 'IT']
df[df['country_code'].isin(countries)]

- Sabil

1

他展示了一个pd.Series，你可以在他的示例底部看到系列名称（或列名称）。 - Andreas

.apply(lambda x: x in ['DE', 'AT', 'GB'])，你想做一个基准测试吗？ - oakca

Oakca，你可以使用Colab和Py，在filter行上调用%timeit来完成这个操作。 - Sabil

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Andreas · Accepted Answer

您可以使用.isin()：

df[df['country_code'].isin(['DE', 'GB', 'IT'])]

性能比较：

import timeit
import pandas as pd
df = pd.DataFrame({'country_code': ['DE', 'GB', 'IT', 'MT', 'FI', 'XX'] * 1000})

%timeit df[df['country_code'].isin(['DE', 'GB', 'IT'])]
409 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['country_code'].apply(lambda x: x in ['DE', 'AT', 'GB'])
1.35 ms ± 474 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)