如何在 Pandas 中使用 group by 后选择每个组的前 n 行？

Question

8

I have a pandas dataframe with following shape

 open_year, open_month, type, col1, col2, ....

我希望能找到每个（年，月）中的最高类型，因此我首先要找到每个（年，月）中每种类型的数量。

freq_df = df.groupby(['open_year','open_month','type']).size().reset_index()
freq_df.columns = ['open_year','open_month','type','count']

我想找出每个（年_月）基于它们的频率（例如计数）的前n种类型。我该如何做呢？

我可以使用nlargest，但我缺少类型。

freq_df.groupby(['open_year','open_month'])['count'].nlargest(5)

但我缺少列 类型

- HHH

请阅读 [mcve] 并相应地编辑您的问题。这将使它对社区更有用，也更容易回答。 - piRSquared

它抱怨无法访问“DataFrameGroupBy”对象的可调用属性“nlargest”，请尝试使用“apply”方法。 - HHH

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- cs95 · Accepted Answer

我建议首先按降序对计数进行排序，然后可以在此之后调用GroupBy.head函数。

(freq_df.sort_values('count', ascending=False)
        .groupby(['open_year','open_month'], sort=False).head(5)
)