从Pandas的groupby中提取MultiIndex

4

如何在groupby后,高效地提取每个组中最大客户数后的第二个索引。

假设有一个数据框df,其中包含各个州和每个州的10个官员(名称为Officer 1Officer 10)。列Current Status始终具有值Customer

    State List  Sales Officer   Current Status
0   UP          Officer 4       Customer
1   MH          Officer 5       Customer
2   AP          Officer 6       Customer
3   AN          Officer 2       Customer
4   GJ          Officer 3       Customer
.... so on

预期输出包括每个州客户数量最高的销售人员:
State List     Sales Officer      
AN             Officer 6          403
AP             Officer 1          266
               Officer 8          266
... and so on

到目前为止,我已执行了以下操作:

df.groupby(['State List', 'Sales Officer'])['Current Status'].count()#.reset_index()

给我下列内容:
State List  Sales Officer
AN          Officer 1        376
            Officer 10       401
            Officer 2        353
            Officer 3        373
            Officer 4        375
            Officer 5        382
            Officer 6        403
            Officer 7        400
            Officer 8        385
            Officer 9        378
AP          Officer 1        266
            Officer 10       228
            Officer 2        240
            Officer 3        248
            Officer 4        235
            Officer 5        229
            Officer 6        242
            Officer 7        238
            Officer 8        266
            Officer 9        243

现在,我遇到了一个问题,需要找出每个州列表中客户数最多的 Sales Officer。你有什么想法吗?
1个回答

4

使用布尔索引maxtransform函数,返回与原始数据大小相同的Series

s = df.groupby(['State List', 'Sales Officer'])['Current Status'].count()
df = s[s == s.groupby('State List').transform('max')]
print (df)
State List  Sales Officer
AN          Officer 6        403
AP          Officer 1        266
            Officer 8        266
Name: a, dtype: int64

详细信息:

print (s.groupby('State List').transform('max'))
State List  Sales Officer
AN          Officer 1        403
            Officer 10       403
            Officer 2        403
            Officer 3        403
            Officer 4        403
            Officer 5        403
            Officer 6        403
            Officer 7        403
            Officer 8        403
            Officer 9        403
AP          Officer 1        266
            Officer 10       266
            Officer 2        266
            Officer 3        266
            Officer 4        266
            Officer 5        266
            Officer 6        266
            Officer 7        266
            Officer 8        266
            Officer 9        266
Name: a, dtype: int64

1
谢谢。然而,对于州“AP”,有2名官员具有相同的最高计数,我还需要获取两个名称,就像我在期望的输出中提到的那样。 - sync11

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接