Pandas - 多级索引下的unstack/pivot操作

3

我有一个已经处理过的DataFrame,想要进行透视操作,但是使用两个列作为索引无法实现。

import pandas as pd
df = pd.DataFrame({'A': {0: 'XYZ', 1: 'XYZ', 2: 'XYZ', 3: 'XYZ', 4: 'XYZ', 5: 'XYZ', 6: 'XYZ', 7: 'XYZ', 8: 'XYZ', 9: 'XYZ', 10: 'ABC', 11: 'ABC', 12: 'ABC', 13: 'ABC', 14: 'ABC', 15: 'ABC', 16: 'ABC', 17: 'ABC', 18: 'ABC', 19: 'ABC'}, 'B': {0: '01/01/2017', 1: '02/01/2017', 2: '03/01/2017', 3: '04/01/2017', 4: '05/01/2017', 5: '01/01/2017', 6: '02/01/2017', 7: '03/01/2017', 8: '04/01/2017', 9: '05/01/2017', 10: '01/01/2017', 11: '02/01/2017', 12: '03/01/2017', 13: '04/01/2017', 14: '05/01/2017', 15: '01/01/2017', 16: '02/01/2017', 17: '03/01/2017', 18: '04/01/2017', 19: '05/01/2017'}, 'C': {0: 'Price', 1: 'Price', 2: 'Price', 3: 'Price', 4: 'Price', 5: 'Trading', 6: 'Trading', 7: 'Trading', 8: 'Trading', 9: 'Trading', 10: 'Price', 11: 'Price', 12: 'Price', 13: 'Price', 14: 'Price', 15: 'Trading', 16: 'Trading', 17: 'Trading', 18: 'Trading', 19: 'Trading'}, 'D': {0: '100', 1: '101', 2: '102', 3: '103', 4: '104', 5: 'Yes', 6: 'Yes', 7: 'Yes', 8: 'Yes', 9: 'Yes', 10: '50', 11: nan, 12: '48', 13: '47', 14: '46', 15: 'Yes', 16: 'No', 17: 'Yes', 18: 'Yes', 19: 'Yes'}})

所以:
A   B   C   D
XYZ 01/01/2017  Price   100
XYZ 02/01/2017  Price   101
XYZ 03/01/2017  Price   102
XYZ 04/01/2017  Price   103
XYZ 05/01/2017  Price   104
XYZ 01/01/2017  Trading Yes
XYZ 02/01/2017  Trading Yes
XYZ 03/01/2017  Trading Yes
XYZ 04/01/2017  Trading Yes
XYZ 05/01/2017  Trading Yes
ABC 01/01/2017  Price   50
ABC 02/01/2017  Price   
ABC 03/01/2017  Price   48
ABC 04/01/2017  Price   47
ABC 05/01/2017  Price   46
ABC 01/01/2017  Trading Yes
ABC 02/01/2017  Trading No
ABC 03/01/2017  Trading Yes
ABC 04/01/2017  Trading Yes
ABC 05/01/2017  Trading Yes

Would become:

A   B   Trading Price
ABC 01/01/2017  Yes 50
    02/01/2017  No  
    03/01/2017  Yes 48
    04/01/2017  Yes 47
    05/01/2017  Yes 46
XYZ 01/01/2017  Yes 100
    02/01/2017  Yes 101
    03/01/2017  Yes 102
    04/01/2017  Yes 103
    05/01/2017  Yes 104

或者:

    ABC     XYZ 
    Trading Price   Trading Price
01/01/2017  Yes 50  Yes 100
02/01/2017  No      Yes 101
03/01/2017  Yes 48  Yes 102
04/01/2017  Yes 47  Yes 103
05/01/2017  Yes 46  Yes 104

我认为这可以通过透视表简单完成,但是出现了错误:

df.pivot(index=['A', 'B'], columns = ['C'], values = ['D'] )
Traceback (most recent call last):

  File "<ipython-input-41-afcc34979ff8>", line 1, in <module>
    df.pivot(index=['A', 'B'], columns = ['C'], values = ['D'] )

  File "C:\Miniconda\lib\site-packages\pandas\core\frame.py", line 3951, in pivot
    return pivot(self, index=index, columns=columns, values=values)

  File "C:\Miniconda\lib\site-packages\pandas\core\reshape\reshape.py", line 377, in pivot
    index=MultiIndex.from_arrays([index, self[columns]]))

  File "C:\Miniconda\lib\site-packages\pandas\core\series.py", line 248, in __init__
    raise_cast_failure=True)

  File "C:\Miniconda\lib\site-packages\pandas\core\series.py", line 3027, in _sanitize_array
    raise Exception('Data must be 1-dimensional')

Exception: Data must be 1-dimensional

在R中,这可以很快地通过gather/spread完成。
谢谢!
2个回答

2

你想要这样吗?

In [23]: df.pivot_table(index=['A','B'], columns='C', values='D', aggfunc='first')
Out[23]:
C              Price Trading
A   B
ABC 01/01/2017    50     Yes
    02/01/2017   NaN      No
    03/01/2017    48     Yes
    04/01/2017    47     Yes
    05/01/2017    46     Yes
XYZ 01/01/2017   100     Yes
    02/01/2017   101     Yes
    03/01/2017   102     Yes
    04/01/2017   103     Yes
    05/01/2017   104     Yes

非常感谢!我想我漏掉了 'aggfunc' 参数,这可能导致了问题。我用 unstack() 发布了一个答案,但您的方法更加优雅。 - Yona
'df.pivot_table(index=['B'], columns=['A', 'C'], values='D', aggfunc='first')' 将是我其他问题的答案。 - Yona
@Yona,很高兴我能帮到你 :) - MaxU - stand with Ukraine

1
我发现以下的内容是可能的:

df.set_index(['A', 'C', 'B']).unstack().T
Out[59]: 
A              ABC           XYZ        
C            Price Trading Price Trading
  B                                     
D 01/01/2017    50     Yes   100     Yes
  02/01/2017   NaN      No   101     Yes
  03/01/2017    48     Yes   102     Yes
  04/01/2017    47     Yes   103     Yes
  05/01/2017    46     Yes   104     Yes

并且:

df.set_index(['A', 'B', 'C']).unstack()
Out[61]: 
                   D        
C              Price Trading
A   B                       
ABC 01/01/2017    50     Yes
    02/01/2017   NaN      No
    03/01/2017    48     Yes
    04/01/2017    47     Yes
    05/01/2017    46     Yes
XYZ 01/01/2017   100     Yes
    02/01/2017   101     Yes
    03/01/2017   102     Yes
    04/01/2017   103     Yes
    05/01/2017   104     Yes

1
你可以使用以下代码来去除MultiIndex列:df.set_index(['A','B','C'])['D'].unstack() - Scott Boston

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接