我有一个csv文件,内容如下:
Date,Sentiment
2014-01-03,0.4
2014-01-04,-0.03
2014-01-09,0.0
2014-01-10,0.07
2014-01-12,0.0
2014-02-24,0.0
2014-02-25,0.0
2014-02-25,0.0
2014-02-26,0.0
2014-02-28,0.0
2014-03-01,0.1
2014-03-02,-0.5
2014-03-03,0.0
2014-03-08,-0.06
2014-03-11,-0.13
2014-03-22,0.0
2014-03-23,0.33
2014-03-23,0.3
2014-03-25,-0.14
2014-03-28,-0.25
etc
我的目标是按月汇总数据并计算每月的平均值。日期可能不以1号或1月份开头。问题在于我有大量的数据,这意味着有更多的年份。为此,我想找到最早的日期(月份),从那里开始计算月份和它们的平均值。例如:
Month count, average
1, 0.4 (<= the earliest month)
2, -0.3
3, 0.0
...
12, 0.1
13, -0.4 (<= new year but counting of month is continuing)
14, 0.3
我正在使用Pandas打开csv文件。
data = pd.read_csv("pks.csv", sep=",")
在data['Date']
中有日期,在data['Sentiment']
中有数值。有任何想法如何实现?
df.resample('M').mean()
。 - malletdf.resample('M', on="date").mean()
。 - EMT