使用条件列表在Pandas中过滤DataFrame

5
我希望有一个函数,它可以接受任意长度的条件列表,并在所有条件之间添加一个“&”符号。以下是示例代码。
df = pd.DataFrame(columns=['Sample', 'DP','GQ', 'AB'],
         data=[
               ['HG_12_34', 200, 35, 0.4],
               ['HG_12_34_2', 50, 45, 0.9],
               ['KD_89_9', 76, 67, 0.7],
               ['KD_98_9_2', 4, 78, 0.02],
               ['LG_3_45', 90, 3, 0.8],
               ['LG_3_45_2', 15, 12, 0.9]
               ])


def some_func(df, cond_list):

    # wrap ampersand between multiple conditions
    all_conds = ?

    return df[all_conds]

cond1 = df['DP'] > 40
cond2 = df['GQ'] > 40
cond3 = df['AB'] < 0.4


some_func(df, [cond1, cond2]) # should return df[cond1 & cond2]
some_func(df, [cond1, cond3, cond2]) # should return df[cond1 & cond3 & cond2]

I would appreciate any help with this.

1个回答

10
您可以使用functools.reduce来实现这一点:
<b>from functools import reduce</b>

def some_func(df, cond_list):
    return df[<b>reduce(lambda x,y: x&y</b>, cond_list<b>)</b>]

或者,像@AryaMcCarthy所说的那样,您可以使用operator包中的and_

from functools import reduce
<b>from operator import and_</b>

def some_func(df, cond_list):
    return df[reduce(<b>and_</b>, cond_list)]

或者像@ayhan所说的那样,使用类似于numpy的逻辑与条件过滤数据框:

<b>from numpy import logical_and</b>

def some_func(df, cond_list):
    return df[<b>logical_and.reduce(</b>cond_list<b>)</b>]

所有三个版本针对你的样例输入都会产生以下输出:

>>> some_func(df, [cond1, cond2])
       Sample  DP  GQ   AB
1  HG_12_34_2  50  45  0.9
2     KD_89_9  76  67  0.7
>>> some_func(df, [cond1, cond2, cond3])
Empty DataFrame
Columns: [Sample, DP, GQ, AB]
Index: []

使用operator.and_可能会更好,而不是使用您自定义的lambda函数。 - Arya McCarthy
@AryaMcCarthy:是的,确实更整洁。 - Willem Van Onsem
3
或者,使用numpy:np.logical_and.reduce([cond1, cond2, cond3]) - ayhan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接