pandas - 检查数据框按组分组后是否存在非唯一值

Question

7

我有一个简单的数据框 df：

a,b
1,2
1,3
1,4
1,2
2,1
2,2
2,3
2,5
2,5

我想检查在 a 中的每个组是否存在与 b 重复的条目。目前我已经做了以下工作：

g = df.groupby('a')['b'].unique()

这将返回：

a
1       [2, 3, 4]
2    [1, 2, 3, 5]

但我想要的是一个列表，列出a中每个组在b中出现的多个情况。在这种情况下，预期输出应该是：

a
1    [2]
2    [5]

- Fabio Lamanna

2个回答

9

我们可以使用duplicated。

print(df[df.duplicated()].drop_duplicates())

- akrun

1

我认为不需要使用“去重”函数。df1[df1.duplicated()] 子集是否正确？ - Pierre L

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lee · Accepted Answer

g=df.groupby('a')['b'].value_counts()
g.where(g>1).dropna()