如何在 Pandas 透视表中去除多级索引

19
我有一个给定的数据框如下:
df = {'TYPE' : pd.Series(['Advisory','Advisory1','Advisory2','Advisory3']),
 'CNTRY' : pd.Series(['IND','FRN','IND','FRN']),
 'VALUE' : pd.Series([1., 2., 3., 4.])}
df = pd.DataFrame(df)
df = pd.pivot_table(df,index=["CNTRY"],columns=["TYPE"]).reset_index()

透过数据透视,如何将列和 df 转换成下面这种格式的数据框?删除多层索引中的 VALUE 列。

Type|CNTRY|Advisory|Advisory1|Advisory2|Advisory3
0     FRN     NaN      2.0      NaN     4.0 
1     IND     1.0      NaN      3.0     NaN 
3个回答

41
你可以添加参数values
df = pd.pivot_table(df,index="CNTRY",columns="TYPE", values='VALUE').reset_index()
print (df)
TYPE CNTRY  Advisory  Advisory1  Advisory2  Advisory3
0      FRN       NaN        2.0        NaN        4.0
1      IND       1.0        NaN        3.0        NaN

要删除列名,可以使用rename_axis函数:

df = pd.pivot_table(df,index="CNTRY",columns="TYPE", values='VALUE') \
       .reset_index().rename_axis(None, axis=1)
print (df)
  CNTRY  Advisory  Advisory1  Advisory2  Advisory3
0   FRN       NaN        2.0        NaN        4.0
1   IND       1.0        NaN        3.0        NaN

但也许只需要 旋转

df = df.pivot(index="CNTRY",columns="TYPE", values='VALUE') \
       .reset_index().rename_axis(None, axis=1)
print (df)
  CNTRY  Advisory  Advisory1  Advisory2  Advisory3
0   FRN       NaN        2.0        NaN        4.0
1   IND       1.0        NaN        3.0        NaN

因为pivot_table默认会用聚合函数mean来对重复数据进行聚合:

df = {'TYPE' : pd.Series(['Advisory','Advisory1','Advisory2','Advisory1']),
 'CNTRY' : pd.Series(['IND','FRN','IND','FRN']),
 'VALUE' : pd.Series([1., 4., 3., 4.])}
df = pd.DataFrame(df)
print (df)
  CNTRY       TYPE  VALUE
0   IND   Advisory    1.0
1   FRN  Advisory1    1.0 <-same FRN and Advisory1 
2   IND  Advisory2    3.0
3   FRN  Advisory1    4.0 <-same FRN and Advisory1 

df = df.pivot_table(index="CNTRY",columns="TYPE", values='VALUE')
       .reset_index().rename_axis(None, axis=1)
print (df)
TYPE   Advisory  Advisory1  Advisory2
CNTRY                                
FRN         0.0        2.5        0.0
IND         1.0        0.0        3.0

使用groupby、聚合函数和unstack进行替代:

df = df.groupby(["CNTRY","TYPE"])['VALUE'].mean().unstack(fill_value=0)
      .reset_index().rename_axis(None, axis=1)
print (df)
  CNTRY  Advisory  Advisory1  Advisory2
0   FRN       0.0        2.5        0.0
1   IND       1.0        0.0        3.0

4
您可以使用set_indexunstack来实现。
df.set_index(['CNTRY', 'TYPE']).VALUE.unstack().reset_index()

TYPE CNTRY  Advisory  Advisory1  Advisory2  Advisory3
0      FRN       NaN        2.0        NaN        4.0
1      IND       1.0        NaN        3.0        NaN

1

df.columns = df.columns.droplevel(level=1)

根据您的需求更改级别。


1
这对我有用。谢谢! - Buzzy Hopewell

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接