如果您想要一个通用的解决方案,甚至不需要知道在filter_dict中指定的列,您可以使用双重reduce:
from functools import reduce
from operator import invert
def filter_df(df, filter_dict, option='keep'):
slice_vector = reduce(lambda x, y: x | y, [reduce(lambda x, y:
x & y, [df[col] == val for col, val in el.items()])
for el in filter_dict])
if option == 'keep':
return df.loc[slice_vector]
elif option == 'exclude':
return df.loc[invert(slice_vector)]
else:
NotImplementedError(f"Option {option} not implemented. Please choose between 'keep' and 'exclude'.")
让我们将其应用于各种测试案例:
data = {"id": [1,2,3], "a": ["HH", "HH", "W"], "b": ["DOG", "CAT", "DOG"]}
df = pd.DataFrame(data)
filter_dict_1 = [{'a': 'HH'}, {'a': 'W','b':'DOG'}]
df1 = filter_df(df, filter_dict_1, "keep")
print(df1)
filter_dict_2 = [{'a': 'HH', 'b': 'CAT'}]
df2 = filter_df(df, filter_dict_2, "exclude")
print(df2)
filter_dict_3 = [{'a': 'HH', 'b':'CAT'}, {"a": 'HH'}]
df3 = filter_df(df, filter_dict_3, "exclude")
print(df3)
我们的想法是首先根据单个字典创建一个布尔向量。这些向量是通过使用&
组合单个条件来创建的,然后将这些向量与|
组合以生成最终的过滤向量。