如何使用groupby获取与某列最大值对应的所有行？

Question

如何使用groupby获取与某列最大值对应的所有行？

3

针对给定的数据框 df，其格式如下：

   Election Yr.  Party   States Votes
0     2000           A       a    50  
1     2000           A       b    30
2     2000           B       a    40
3     2000           B       b    50  
4     2000           C       a    30
5     2000           C       b    40
6     2005           A       a    50  
7     2005           A       b    30
8     2005           B       a    40
9     2005           B       b    50  
10    2005           C       a    30
11    2005           C       b    40

我想获取对应年份中获得最高票数的政党。我已经使用以下代码按 "选举年份" 和 "政党" 进行分组，然后使用 .sum() 获取每个年份中每个政党的总票数。

df = df.groupby(['Election Yr.', 'Party']).sum()

如何每年获得最多选票的政党？我无法理解。

非常感谢您的支持。

- Deepak

我认为你在这个答案中寻找的是idmax解决方案： https://dev59.com/6Gkv5IYBdhLWcg3w_lzE - user14518362

@user14518362 - 这不是 OP 所要求的。"每年最大投票数"。 - not_speshal

我尝试过这样做，但它只给出了总体最大值的行。而我需要每年的最大值所在的行。 - Deepak

3个回答

1

1. 使用内连接

在进行第一个groupby之前，您可以从df开始。然后获取每年的最大投票数，并在年份-投票组合上进行合并，以获得每年获得最多选票的政党。

# Original data
df = pd.DataFrame({'Election Yr.':[2000,2000,2000,2000,2000,2000,2005,2005,2005,2005,2005,2005],
                   'Party':['A','A','B','B','C','C','A','A','B','B','C','C',],
                   'Votes':[50,30,40,50,30,40,50,30,40,50,30,40]})

# Get number of votes per year-party
df = df.groupby(['Election Yr.','Party'])['Votes'].sum().reset_index()

# Get max number of votes per year
max_ = df.groupby('Election Yr.')['Votes'].max().reset_index()

# Merge on key
max_ = max_.merge(df, on=['Election Yr.','Votes'])

# Results
print(max_)

>    Election Yr.  Votes Party
> 0          2000     90     B
> 1          2005     90     B

2.排序并保留第一个观测值

另一种方法是按每年的投票数进行排序：

df = df.groupby(['Election Yr.','Party'])['Votes'].sum().reset_index()
df = df.sort_values(['Election Yr.','Votes'], ascending=False)
print(df.groupby('Election Yr.').first().reset_index())

print(df)

>    Election Yr. Party  Votes
> 0          2000     B     90
> 1          2005     B     90

- Arturo Sbr

谢谢，Arturo。考虑一个不同的数据集，我如何获取每年得到最多选票的前10个政党？ - Deepak

你可以使用第二种方法，尝试使用 head(10) 而不是 first()。请查看此帖子。 - Arturo Sbr

在使用head(10)打印每年前十个派对时，为什么它会将最新的一年打印在顶部。我想要较旧的年份在顶部。该如何实现？ - Deepak

按年份对新数据进行排序：df.sort_values(Election Yr., ascending=True)。在 head(10) 后一定要记得 reset_index()。即：df.groupby(....).head(10).reset_index()，然后再次排序。 - Arturo Sbr

-1

在这里，您可以看到每个政党（A、B、C）在选举年份中获得的总票数。

- Vicky Raghuwanshi

谢谢。考虑一个不同的数据集，我如何获取每年得到最多选票的前10个政党？ - Deepak

不要发布图片，并且要解释你正在做什么。 - Ashok

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- not_speshal · Accepted Answer

尝试使用groupby和idxmax的组合：

gb = df.groupby(["Election Yr.", "Party"]).sum()
gb.loc[gb.groupby("Election Yr.")["Votes"].idxmax()].reset_index()
>>> gb
   Election Yr. Party  Votes
0          2000     B     90
1          2005     B     90