Pandas表格的分组和透视

5
这应该很快,但我所做的所有数据透视/分组都没有得出我需要的结果。
我有一个像这样的表格:
        Letter  Period  Amount
YrMnth
2014-12      B       6       0
2014-12      C       8       1
2014-12      C       9       2
2014-12      C      10       3
2014-12      C       6       4
2014-12      C      12       5
2014-12      C       7       6
2014-12      C      11       7
2014-12      D       9       8
2014-12      D      10       9
2014-12      D       1      10
2014-12      D       8      11
2014-12      D       6      12
2014-12      D      12      13
2014-12      D       7      14
2014-12      D      11      15
2014-12      D       4      16
2014-12      D       3      17
2015-01      B       7      18
2015-01      B       8      19
2015-01      B       1      20
2015-01      B      10      21
2015-01      B      11      22
2015-01      B       6      23
2015-01      B       9      24
2015-01      B       3      25
2015-01      B       5      26
2015-01      C      10      27

我希望对其进行数据透视,将索引基本设置为YrMonth和Letter,周期为列,金额为值。

我大致了解了数据透视的概念,但在尝试使用多个索引进行透视时出现了错误。我将索引作为一列,并尝试了以下操作:

In [76]: df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')

但是我遇到了这个错误:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-76-fc2a4c5f244d> in <module>()
----> 1 df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in pivot(self, index, columns, values)
   3761         """
   3762         from pandas.core.reshape import pivot
-> 3763         return pivot(self, index=index, columns=columns, values=values)
   3764
   3765     def stack(self, level=-1, dropna=True):

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/reshape.pyc in pivot(self, index, columns, values)
    331         indexed = Series(self[values].values,
    332                          index=MultiIndex.from_arrays([index,
--> 333                                                        self[columns]]))
    334         return indexed.unstack(columns)
    335

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
    225                                        raise_cast_failure=True)
    226
--> 227                 data = SingleBlockManager(data, index, fastpath=True)
    228
    229         generic.NDFrame.__init__(self, data, fastpath=True)

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, block, axis, do_integrity_check, fastpath)
   3734             block = make_block(block,
   3735                                placement=slice(0, len(axis)),
-> 3736                                ndim=1, fastpath=True)
   3737
   3738         self.blocks = [block]

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in make_block(values, placement, klass, ndim, dtype, fastpath)
   2452
   2453     return klass(values, ndim=ndim, fastpath=fastpath,
-> 2454                  placement=placement)
   2455
   2456

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, values, placement, ndim, fastpath)
     85             raise ValueError('Wrong number of items passed %d,'
     86                              ' placement implies %d' % (
---> 87                                  len(self.values), len(self.mgr_locs)))
     88
     89     @property

ValueError: Wrong number of items passed 138, placement implies 2

索引实际上是前两列(YrMnth和Letter),因此如果按照这种方式进行分组,就不应该有任何重复项。我只是无法弄清楚那种方法。 - user1610719
1个回答

4
如果我理解正确的话,pivot_table 可能更接近您所需要的内容:
df = df.pivot_table(index=["YrMnth", "Letter"], columns="Period", values="Amount")

这将为您带来:

Period          1   3   4   5   6   7   8   9   10  11  12
YrMnth  Letter                                            
2014-12 B      NaN NaN NaN NaN   0 NaN NaN NaN NaN NaN NaN
        C      NaN NaN NaN NaN   4   6   1   2   3   7   5
        D       10  17  16 NaN  12  14  11   8   9  15  13
2015-01 B       20  25 NaN  26  23  18  19  24  21  22 NaN
        C      NaN NaN NaN NaN NaN NaN NaN NaN  27 NaN NaN

如评论中所建议:

 df = pd.pivot_table(df, index=["YrMnth", "Letter"], columns="Period", values="Amount")


Period          1   3   4   5   6   7   8   9   10  11  12
YrMnth  Letter                                            
2014-12 B      NaN NaN NaN NaN   0 NaN NaN NaN NaN NaN NaN
        C      NaN NaN NaN NaN   4   6   1   2   3   7   5
        D       10  17  16 NaN  12  14  11   8   9  15  13
2015-01 B       20  25 NaN  26  23  18  19  24  21  22 NaN
        C      NaN NaN NaN NaN NaN NaN NaN NaN  27 NaN NaN

如果有人想澄清前者会失败的原因,那么也会产生相同的结果。

pivot_table 的正确语法应该是:df = pd.pivot_table(df, index=["YrMnth", "Letter"], columns="Period", values="Amount") - Fabio Lamanna
@Fabio,有什么区别吗? - Padraic Cunningham
df.pivot()pandas.pivot_table()是不同的函数:df.pivot()pd.pivot_table() - Fabio Lamanna
@Fabio,它们都给出了相同的输出,实际上有什么不同之处? - Padraic Cunningham
我没有测试输出,只是参考文档中提出的正确语法定义。也许这个链接可以帮到你:https://dev59.com/u10Z5IYBdhLWcg3w4jhy。 - Fabio Lamanna
显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接