当我在pandas中使用groupby时,我的维度去哪了?

3
我使用pandas进行了分组,现在想要遍历每一行。维度去哪里了?
df = pandas.DataFrame.from_dict(
    {'category': {0: 'Apps', 1: 'Apps', 2: 'Apps', 3: 'Apps', 4: 'Apps', 5: 'Apps', 6: 'Apps', 7: 'Apps', 8: 'Apps', 9: 'Apps', 10: 'Apps', 11: 'Apps', 12: 'Apps', 13: 'Apps', 14: 'Apps'}, 'country': {0: 'N/A', 1: 'Australia', 2: 'Austria', 3: 'Belgium', 4: 'Brazil', 5: 'Canada', 6: 'China', 7: 'Dominican Republic', 8: 'Finland', 9: 'Greece', 10: 'Hungary', 11: 'India', 12: 'Indonesia', 13: 'Luxembourg', 14: 'Nepal'}, 'criteria': {0: 'referrer=direct', 1: 'referrer=direct', 2: 'referrer=direct', 3: 'referrer=direct', 4: 'referrer=direct', 5: 'referrer=direct', 6: 'referrer=direct', 7: 'referrer=direct', 8: 'referrer=direct', 9: 'referrer=direct', 10: 'referrer=direct', 11: 'referrer=direct', 12: 'referrer=direct', 13: 'referrer=direct', 14: 'referrer=direct'}, 'date': {0: '2013-11-05', 1: '2013-11-05', 2: '2013-11-05', 3: '2013-11-05', 4: '2013-11-05', 5: '2013-11-05', 6: '2013-11-05', 7: '2013-11-05', 8: '2013-11-05', 9: '2013-11-05', 10: '2013-11-05', 11: '2013-11-05', 12: '2013-11-05', 13: '2013-11-05', 14: '2013-11-05'}, 'cpc_cpm_revenue': {0: 0.001, 1: 0.01942, 2: 0.0050000000000000001, 3: 0.002, 4: 0.012200000000000001, 5: 0.020899999999999998, 6: 0.030499999999999999, 7: 0.001, 8: 0.0050000000000000001, 9: 0.019, 10: 0.012, 11: 0.017999999999999999, 12: 0.001, 13: 0.0040000000000000001, 14: 0.001}, 'impressions': {0: 1.0, 1: 12.0, 2: 1.0, 3: 2.0, 4: 14.0, 5: 17.0, 6: 31.0, 7: 1.0, 8: 5.0, 9: 19.0, 10: 12.0, 11: 18.0, 12: 1.0, 13: 1.0, 14: 1.0}, 'clicks': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0, 10: 0.0, 11: 0.0, 12: 0.0, 13: 0.0, 14: 0.0}, 'size': {0: '300x250', 1: '300x250', 2: '300x250', 3: '300x250', 4: '300x250', 5: '300x250', 6: '300x250', 7: '300x250', 8: '300x250', 9: '300x250', 10: '300x250', 11: '300x250', 12: '300x250', 13: '300x250', 14: '300x250'}}
)


 df = df.groupby(by=['date','category','country','criteria','size']).sum()
 print df.columns
 Index([u'clicks', u'cpc_cpm_revenue', u'impressions'], dtype=object)

所以...哇...我有点困惑。我丢失了:
'date','category','country','criteria','size'
1个回答

4

您并没有错过任何内容。您要求在五列上进行groupby,即 ['date','category','country','criteria','size'] ,因此您得到了这个结果。这些列现在成为索引:

>>> df.head()
                                                       clicks  cpc_cpm_revenue  \
date       category country   criteria        size                               
2013-11-05 Apps     Australia referrer=direct 300x250       0          0.01942   
                    Austria   referrer=direct 300x250       0          0.00500   
                    Belgium   referrer=direct 300x250       0          0.00200   
                    Brazil    referrer=direct 300x250       0          0.01220   
                    Canada    referrer=direct 300x250       0          0.02090   

                                                       impressions  
date       category country   criteria        size                  
2013-11-05 Apps     Australia referrer=direct 300x250           12  
                    Austria   referrer=direct 300x250            1  
                    Belgium   referrer=direct 300x250            2  
                    Brazil    referrer=direct 300x250           14  
                    Canada    referrer=direct 300x250           17  

>>> df.columns
Index([clicks, cpc_cpm_revenue, impressions], dtype=object)
>>> df.index
MultiIndex
[(2013-11-05, Apps, Australia, referrer=direct, 300x250), (2013-11-05, Apps, Austria, referrer=direct, 300x250), (2013-11-05, Apps, Belgium, referrer=direct, 300x250), (2013-11-05, Apps, Brazil, referrer=direct, 300x250), (2013-11-05, Apps, Canada, referrer=direct, 300x250), (2013-11-05, Apps, China, referrer=direct, 300x250), (2013-11-05, Apps, Dominican Republic, referrer=direct, 300x250), (2013-11-05, Apps, Finland, referrer=direct, 300x250), (2013-11-05, Apps, Greece, referrer=direct, 300x250), (2013-11-05, Apps, Hungary, referrer=direct, 300x250), (2013-11-05, Apps, India, referrer=direct, 300x250), (2013-11-05, Apps, Indonesia, referrer=direct, 300x250), (2013-11-05, Apps, Luxembourg, referrer=direct, 300x250), (2013-11-05, Apps, N/A, referrer=direct, 300x250), (2013-11-05, Apps, Nepal, referrer=direct, 300x250)]

如果你想再次将它们变成列,可以调用.reset_index()

>>> df = df.reset_index()
>>> df.head()
         date category    country         criteria     size  clicks  cpc_cpm_revenue  \
0  2013-11-05     Apps  Australia  referrer=direct  300x250       0          0.01942   
1  2013-11-05     Apps    Austria  referrer=direct  300x250       0          0.00500   
2  2013-11-05     Apps    Belgium  referrer=direct  300x250       0          0.00200   
3  2013-11-05     Apps     Brazil  referrer=direct  300x250       0          0.01220   
4  2013-11-05     Apps     Canada  referrer=direct  300x250       0          0.02090   

   impressions  
0           12  
1            1  
2            2  
3           14  
4           17  

或者,正如@Andy Hayden所指出的那样,在第一次创建时不要将它们作为索引:
>>> df = df.groupby(by=['date','category','country','criteria','size'], as_index=False).sum()

1
您还可以使用 groupby(..., as_index=False),但在0.12版本中与apply一起使用可能会出现问题,在0.13版本中得到了修复。 - Andy Hayden
我希望0.13很快就会发布 - 在过去的一两周中,已经有半打的问题的答案是“它在主干中已经修复”。 :^) - DSM

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接