Pandas 多级索引下的 mean() 函数

Question

Pandas 多级索引下的 mean() 函数

4

我有一个df：

CU           Parameters           1       2       3
379-H   Output Energy, (Wh/h)   0.045   0.055   0.042
349-J   Output Energy, (Wh/h)   0.001   0.003   0
625-H   Output Energy, (Wh/h)   2.695   1.224   1.272
626-F   Output Energy, (Wh/h)   1.381   1.494   1.3

我想创建两个不同的数据框，通过对级别为0的索引（CU）进行分组，获取列值的平均值：

df1:（379-H和625-H）

Parameters                1     2      3
Output Energy, (Wh/h)    1.37   0.63   0.657

df2：（其余部分）

Parameters                 1     2      3
Output Energy, (Wh/h)     0.69  0.74   0.65

我可以通过分组级别1来获取所有人的平均值：

df = df.apply(pd.to_numeric, errors='coerce').dropna(how='all').groupby(level=1).mean()

但是我怎样按照 0 级别对它们进行分组呢？

解决方案：

lightsonly = ["379-H", "625-H"]
df = df.apply(pd.to_numeric, errors='coerce').dropna(how='all')
mask = df.index.get_level_values(0).isin(lightsonly)
df1 = df[mask].groupby(level=1).mean()
df2 = df[~mask].groupby(level=1).mean()

- warrenfitzhenry

3个回答

2

考虑数据框 df，其中假设CU和Parameters在索引中。

                                 1      2      3
CU    Parameters                                
379-H Output Energy, (Wh/h)  0.045  0.055  0.042
349-J Output Energy, (Wh/h)  0.001  0.003  0.000
625-H Output Energy, (Wh/h)  2.695  1.224  1.272
626-F Output Energy, (Wh/h)  1.381  1.494  1.300

然后我们可以根据第一级值是否在列表 ['379-H'，'625-H'] 中的真值对其进行分组。

m = {True: 'Main', False: 'Rest'}
l = ['379-H', '625-H']
g = df.index.get_level_values('CU').isin(l)
df.groupby(g).mean().rename(index=m)

          1       2      3
Rest  0.691  0.7485  0.650
Main  1.370  0.6395  0.657

- piRSquared

1

#Use a lambda function to change index to 2 groups and then groupby using the modified index.
df.groupby(by=lambda x:'379-H,625-H' if x[0] in ['379-H','625-H'] else 'Others').mean()
Out[22]: 
                 1       2      3
379-H,625-H  1.370  0.6395  0.657
Others       0.691  0.7485  0.650

- Allen Qin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

使用 get_level_values + isin 获取 True 和 False 索引，然后通过 dict 重命名并使用 mean。

d = {True: '379-H and 625-H', False: 'the rest'}
df.index = df.index.get_level_values(0).isin(['379-H', '625-H'])
df = df.mean(level=0).rename(d)
print (df)
                     1       2      3
the rest         0.691  0.7485  0.650
379-H and 625-H  1.370  0.6395  0.657

对于单独的 dfs，也可以使用 布尔索引：

mask= df.index.get_level_values(0).isin(['379-H', '625-H'])

df1 = df[mask].mean().rename('379-H and 625-H').to_frame().T
print (df1)
                    1       2      3
379-H and 625-H  1.37  0.6395  0.657

df2 = df[~mask].mean().rename('the rest').to_frame().T
print (df2)
              1       2     3
the rest  0.691  0.7485  0.65

另一个使用DataFrame构造函数的numpy解决方案：

a1 = df[mask].values.mean(axis=0)
#alternatively
#a1 = df.values[mask].mean(axis=0)
df1 = pd.DataFrame(a1.reshape(-1, len(a1)), index=['379-H and 625-H'], columns=df.columns)
print (df1)
                    1       2      3
379-H and 625-H  1.37  0.6395  0.657