如何在Pandas数据框中计算聚合汇总统计量

Question

如何在Pandas数据框中计算聚合汇总统计量

5

I have a Pandas dataframe similar to this:

>>> df = pd.DataFrame(data=np.array([['red', 'cup', 1.50], ['blue', 'jug', 2.40], ['red', 'cup', 1.75], ['blue', 'cup', 2.30]]),
...                   columns=['colour', 'item', 'price'])
>>> df
  colour item price
0    red  cup   1.5
1   blue  jug   2.4
2    red  cup  1.75
3   blue  cup   2.3

什么是计算颜色和商品所有可能组合的价格总体统计数据的最简洁方法？

期望的输出如下所示：

colour     item      mean     stdev
red        cup       1.625    0.176
blue       jug       2.4      NA
blue       cup       2.3      NA

- Javide

2个回答

2

你可以使用groupby与.agg结合，并传递mean和std函数：
最初的回答

print(df.groupby(['colour', 'item']).agg({'price':['mean', 'std']}).reset_index())

  colour item  price          
                mean       std
0   blue  cup  2.300       NaN
1   blue  jug  2.400       NaN
2    red  cup  1.625  0.176777

- Erfan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- BENY · Accepted Answer

注意，您创建的数据框架强制将列“price”转换为字符串而不是数字，因为 numpy array 仅接受一个 dtype

运行：

df.price=pd.to_numeric(df.price)

在使用groupby之后，我将使用describe

df.groupby(['colour','item']).price.describe()# you can add reset_index() here
             count   mean       std  min     25%    50%     75%   max
colour item                                                          
blue   cup     1.0  2.300       NaN  2.3  2.3000  2.300  2.3000  2.30
       jug     1.0  2.400       NaN  2.4  2.4000  2.400  2.4000  2.40
red    cup     2.0  1.625  0.176777  1.5  1.5625  1.625  1.6875  1.75

或者你可以使用agg

df.groupby(['colour','item']).price.agg(['std','mean'])