如何在 Pandas 中进行多个 groupby 分组?

4

我有一个交易数据框和客户的社交团体:

print(df.sample(10))


           Shop  Transaction_value Social Group
7           KFC                  7         Rich
22  Burger King                342         Rich
19  Burger King                  6         Rich
5           KFC                  2         Poor
14    McDonalds                245         Rich
2           KFC                  3         Poor
16    McDonalds                 56         Poor
6           KFC                  6         Poor
20  Burger King                 23         Poor
8           KFC                  5         Poor

我已经使用groupby命令,得出了每个商店最常见的社交群体:

(df.groupby(['Shop', 'Social Group'])['Transaction_value'].count())

Shop         Social Group
Burger King  Poor            7
             Rich            3
KFC          Poor            6
             Rich            3
McDonalds    Poor            3
             Rich            6

我希望将上述数字除以每个社交群体的value_counts()
df['Social Group'].value_counts()

Poor    16
Rich    12

所以在我的第一个 groupby 中,无论在哪里出现 Poor,我都想除以 16。无论在哪里出现 Rich,我都想除以 12。

所以我会有这样的数据框:

Shop         Social Group
Burger King  Poor            0.43
             Rich            0.25
KFC          Poor            0.37
             Rich            0.37
McDonalds    Poor            0.25
             Rich            0.5

我尝试使用div()来实现此操作。我认为每个数据框中的索引会匹配,但它并不起作用:

(df.groupby(['Shop', 'Social Group'])['Transaction_value']
 .count()
 .div(df['Social Group'].value_counts()))

ValueError: cannot join with no overlapping index names

使用内置的pandas函数能否实现我想做的事情?

我认为我可以使用for循环来完成,但这需要花费很多时间。

我的数据框:

df.to_dict()

{'Shop': {0: 'KFC',
  1: 'KFC',
  2: 'KFC',
  3: 'KFC',
  4: 'KFC',
  5: 'KFC',
  6: 'KFC',
  7: 'KFC',
  8: 'KFC',
  9: 'McDonalds',
  10: 'McDonalds',
  11: 'McDonalds',
  12: 'McDonalds',
  13: 'McDonalds',
  14: 'McDonalds',
  15: 'McDonalds',
  16: 'McDonalds',
  17: 'McDonalds',
  18: 'Burger King',
  19: 'Burger King',
  20: 'Burger King',
  21: 'Burger King',
  22: 'Burger King',
  23: 'Burger King',
  24: 'Burger King',
  25: 'Burger King',
  26: 'Burger King',
  27: 'Burger King'},
 'Transaction_value': {0: 1,
  1: 2,
  2: 3,
  3: 34,
  4: 2,
  5: 2,
  6: 6,
  7: 7,
  8: 5,
  9: 4,
  10: 3,
  11: 2,
  12: 12,
  13: 31,
  14: 245,
  15: 123,
  16: 56,
  17: 67,
  18: 68,
  19: 6,
  20: 23,
  21: 44,
  22: 342,
  23: 234,
  24: 3,
  25: 234,
  26: 666,
  27: 88},
 'Social Group': {0: 'Poor',
  1: 'Rich',
  2: 'Poor',
  3: 'Poor',
  4: 'Rich',
  5: 'Poor',
  6: 'Poor',
  7: 'Rich',
  8: 'Poor',
  9: 'Rich',
  10: 'Rich',
  11: 'Rich',
  12: 'Rich',
  13: 'Rich',
  14: 'Rich',
  15: 'Poor',
  16: 'Poor',
  17: 'Poor',
  18: 'Poor',
  19: 'Rich',
  20: 'Poor',
  21: 'Poor',
  22: 'Rich',
  23: 'Poor',
  24: 'Poor',
  25: 'Rich',
  26: 'Poor',
  27: 'Poor'}}
1个回答

7

您已经很接近了,需要使用level=1来匹配MultiIndex的第二层级:

s = df['Social Group'].value_counts()
s1 = df.groupby(['Shop', 'Social Group'])['Transaction_value'].count().div(s, level=1)
print (s1)
Shop         Social Group
Burger King  Poor            0.4375
             Rich            0.2500
KFC          Poor            0.3750
             Rich            0.2500
McDonalds    Poor            0.1875
             Rich            0.5000
dtype: float64

谢谢。div() 在这里非常有用。是否还有 multiply() 版本的 div()?或者 subtract()add() - SCool
1
@SCool - 你是对的,这些函数是存在的 - Series.add Series.sub Series.mul Series.div - jezrael

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接