Pandas按日期分组

5

我有一个事件数据框。一个或多个事件可能在某个日期发生(所以日期不能成为索引)。日期范围为数年。我想按年份和月份分组,并计算类别值的数量。谢谢

in [12]: df = pd.read_excel('Pandas_Test.xls', 'sheet1')
In [13]: df
Out[13]:
    EventRefNr     DateOccurence      Type Category
0        86596    2010-01-02 00:00:00     3    Small
1        86779    2010-01-09 00:00:00    13   Medium
2        86780    2010-02-10 00:00:00     6    Small
3        86781    2010-02-09 00:00:00    17    Small
4        86898    2010-02-10 00:00:00     6    Small
5        86898    2010-02-11 00:00:00     6    Small
6        86902    2010-02-17 00:00:00     9    Small
7        86908    2010-02-19 00:00:00     3   Medium
8        86908    2010-03-05 00:00:00     3   Medium
9        86909    2010-03-06 00:00:00     8    Small
10       86930    2010-03-12 00:00:00    29    Small
11       86934    2010-03-16 00:00:00     9    Small
12       86940    2010-04-08 00:00:00     9     High
13       86941    2010-04-09 00:00:00    17    Small
14       86946    2010-04-14 00:00:00    10    Small
15       86950    2011-01-19 00:00:00    12    Small
16       86956    2011-01-24 00:00:00    13    Small
17       86959    2011-01-27 00:00:00    17    Small

我尝试了:

df.groupby(df['DateOccurence'])

你能展示一下你尝试过的代码吗? - Jeff
2个回答

7

对于月份和年份的拆分,我经常向数据框添加额外的列,将日期拆分成每个部分:

df['year'] = [t.year for t in df.DateOccurence]
df['month'] = [t.month for t in df.DateOccurence]
df['day'] = [t.day for t in df.DateOccurence]

它增加了空间复杂度(向df添加列),但比日期时间索引更少的时间复杂度(在groupby上处理更少),但这取决于您。日期时间索引是更多pandas方式处理事情。

按年,月,日分组后,您可以执行任何所需的groupby操作。

df.groupby['year','month'].Category.apply(pd.value_counts)

为了跨越多年获取月份:
df.groupby['month'].Category.apply(pd.value_counts)

在安迪·海登的日期时间索引中,或者说在datetime索引中。
df.groupby[di.month].Category.apply(pd.value_counts)

你可以简单地选择更适合你需求的方法。

5
你可以对SeriesGroupby(针对列)应用value_counts方法:

value_counts

In [11]: g = df.groupby('DateOccurence')

In [12]: g.Category.apply(pd.value_counts)
Out[12]: 
DateOccurence        
2010-01-02     Small     1
2010-01-09     Medium    1
2010-02-09     Small     1
2010-02-10     Small     2
2010-02-11     Small     1
2010-02-17     Small     1
2010-02-19     Medium    1
2010-03-05     Medium    1
2010-03-06     Small     1
2010-03-12     Small     1
2010-03-16     Small     1
2010-04-08     High      1
2010-04-09     Small     1
2010-04-14     Small     1
2011-01-19     Small     1
2011-01-24     Small     1
2011-01-27     Small     1
dtype: int64

实际上我希望返回以下DataFrame,但是您需要unstack它:

In [13]: g.Category.apply(pd.value_counts).unstack(-1).fillna(0)
Out[13]: 
               High  Medium  Small
DateOccurence                     
2010-01-02        0       0      1
2010-01-09        0       1      0
2010-02-09        0       0      1
2010-02-10        0       0      2
2010-02-11        0       0      1
2010-02-17        0       0      1
2010-02-19        0       1      0
2010-03-05        0       1      0
2010-03-06        0       0      1
2010-03-12        0       0      1
2010-03-16        0       0      1
2010-04-08        1       0      0
2010-04-09        0       0      1
2010-04-14        0       0      1
2011-01-19        0       0      1
2011-01-24        0       0      1
2011-01-27        0       0      1

如果有多个不同的类别具有相同的日期,则它们将在同一行上...

很好,现在如何按月分组? - ArtDijk
@ArtDijk 我认为这里的窍门是使用DatetimeIndex,di = pd.DatetimeIndex(df.DateOccurence); g = df.groupby([di.month, di.year]) - Andy Hayden

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接