按出现次数降序排列并分组

Question

3

你好，我希望删除那些出现次数小于某个数字的条目所在的行，例如：

df = pd.DataFrame({'a': [1,2,3,2], 'b':[4,5,6,7], 'c':[0,1,3,2]})
df

如果'a'列中的出现次数少于两次，则我希望删除所有行。
期望的输出：

   a  b  c
1  2  5  1
3  2  7  2

我知道的是：我们可以通过condition = df['a'].value_counts() < 2来找出出现次数，它会给我类似下面的结果：

2    False
3    True
1    True
Name: a, dtype: int64

但我不知道应该从哪里开始删除这些行。
提前感谢！

- LSF

3个回答

2

您可以尝试像这样获取每个组的长度，将其转换回原始索引，并按其进行df索引。

df[df.groupby("a").transform(len)["b"] >= 2]


    a   b   c
1   2   5   1
3   2   7   2

将其分解为单个步骤，您将得到：

df.groupby("a").transform(len)["b"]

0    1
1    2
2    1
3    2
Name: b, dtype: int64

这些是群组大小转换回您原来的索引。

df.groupby("a").transform(len)["b"] >=2

0    False
1     True
2    False
3     True
Name: b, dtype: bool

我们将其转换为布尔索引，并通过它索引我们的原始数据框。

- Sven Harris

2

res = df[df.groupby('a')['b'].transform('size') >= 2]

transform 方法将 df.groupby('a')['b'].size() 映射到与 df['a'] 对齐的 df。

s = df['a'].value_counts()
res = df[df['a'].map(s) >= 2]

print(res)

   a  b  c
1  2  5  1
3  2  7  2

- jpp

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Khalil Al Hooti · Accepted Answer

你可以使用 df.where 和 dropna。

df.where(df['a'].value_counts() <2).dropna()

     a   b   c
1   2.0 5.0 1.0
3   2.0 7.0 2.0