在pandas中进行多列的逻辑AND运算

Question

在pandas中进行多列的逻辑AND运算

6

我有一个如下所示的数据框(edata)

Domestic   Catsize    Type   Count
   1          0         1      1
   1          1         1      8
   1          0         2      11
   0          1         3      14
   1          1         4      21
   0          1         4      31

我希望从这个数据框中计算出所有计数的和，其中逻辑与（Domestic和Catsize）的结果为零（0），因此

1   0    0
0   1    0
0   0    0

我用来执行这个过程的代码是：

g=edata.groupby('Type')
q3=g.apply(lambda x:x[((x['Domestic']==0) & (x['Catsize']==0) |
                       (x['Domestic']==0) & (x['Catsize']==1) |
                       (x['Domestic']==1) & (x['Catsize']==0)
                       )]
            ['Count'].sum()
           )

q3

Type
1     1
2    11
3    14
4    31

这段代码运行良好，但是如果数据框中的变量数量增加，则条件的数量会迅速增长。因此，有没有一种聪明的方式来编写一个条件语句，即如果将两个（或更多）变量进行AND运算的结果为零，则执行sum()函数。

- eshfaq ahmad

3个回答

4

使用np.logical_and.reduce进行泛化。

columns = ['Domestic', 'Catsize']
df[~np.logical_and.reduce(df[columns], axis=1)].groupby('Type')['Count'].sum()

Type
1     1
2    11
3    14
4    31
Name: Count, dtype: int64

在重新添加之前，使用map进行广播：

u = df[~np.logical_and.reduce(df[columns], axis=1)].groupby('Type')['Count'].sum()
df['NewCol'] = df.Type.map(u)

df
   Domestic  Catsize  Type  Count  NewCol
0         1        0     1      1       1
1         1        1     1      8       1
2         1        0     2     11      11
3         0        1     3     14      14
4         1        1     4     21      31
5         0        1     4     31      31

- cs95

使用“logical_and”函数是否可以用于带有数值的变量？例如，如果catsize列具有0、2、4、5、6、8等值。 - eshfaq ahmad

@eshfaqahmad 首先将列转换为布尔类型：df[col]=df[col].astype(bool) - cs95

非常感谢您的回复。我尝试了，但是出现了错误，提示KeyError: '[-1 -1 -2 -1 -2 -1 ] not in index。我已经将列名“Catsize”更改为“Legs”，并将值更改为0、0、2、4、4、5。 - eshfaq ahmad

1

@eshfaqahmad 你好，我建议你开一个新的问题。这样你会更快地得到帮助。 - cs95

我已经在3天前提出了一个问题，但还没有得到任何答案或评论。 - eshfaq ahmad

0

怎么样？

columns = ['Domestic', 'Catsize']
df.loc[~df[columns].prod(axis=1).astype(bool), 'Count']

然后你可以随心所欲地处理它。

对于逻辑 AND，乘积非常好用。对于逻辑 OR，您可以提前使用适当的否定，并使用 sum(axis=1)。

- qbit

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jpp · Accepted Answer

您可以首先使用pd.DataFrame.all的否定进行筛选：

cols = ['Domestic', 'Catsize']
res = df[~df[cols].all(1)].groupby('Type')['Count'].sum()

print(res)
# Type
# 1     1
# 2    11
# 3    14
# 4    31
# Name: Count, dtype: int64