使用pandas python计算每日气候学。

Question

使用pandas python计算每日气候学。

7

我想使用pandas计算每天的气候数据。我的代码如下：

import pandas as pd

dates      = pd.date_range('1950-01-01', '1953-12-31', freq='D')
rand_data  = [int(1000*random.random()) for i in xrange(len(dates))]
cum_data   = pd.Series(rand_data, index=dates)
cum_data.to_csv('test.csv', sep="\t")

cum_data是数据框，包含从1950年1月1日到1953年12月31日的每日日期。我想创建一个长度为365的新向量，第一个元素包含1950年、1951年、1952年和1953年1月1日rand_data的平均值。对于第二个元素以此类推...

有什么建议可以使用pandas完成这个任务吗？

- user308827

4个回答

4

希望能对你有所帮助，我想发布我的解决方案，以获取与原始时间序列相同索引和长度的气候学系列。

我使用joris的解决方案来获得一个包含365/366个元素的“模型气候学”，然后从这个模型气候学中获取值，并将时间索引从我的原始时间序列中获取。通过这种方式，闰年等问题会自动得到解决。

#I start with my time series named 'serData'.
#I apply joris' solution to it, getting a 'model climatology' of length 365 or 366.
serClimModel = serData.groupby([serData.index.month, serData.index.day]).mean()

#Now I build the climatology series, taking values from serClimModel depending on the index of serData.
serClimatology = serClimModel[zip(serData.index.month, serData.index.day)]

#Now serClimatology has a time index like this: [1,1] ... [12,31].
#So, as a final step, I take as time index the one of serData.
serClimatology.index = serData.index

- DarioZapp

3

@joris. 谢谢。您的回答正是我需要使用pandas计算每日气候数据的方法，但您没有给出最终步骤。将月份、日期索引重新映射回包括闰年在内的一整年的日期索引，即1到366。所以我想与其他用户分享我的解决方案。1950年至1953年共有4个年份，其中一个为闰年，即1952年。注意，由于每次运行使用不同的随机值，因此结果可能不同。

...   
from datetime import date
doy = []
doy_mean = []
doy_size = []
for name, group in cum_data.groupby([cum_data.index.month, cum_data.index.day]):
  (mo, dy) = name
  # Note: can use any leap year here.
  yrday = (date(1952, mo, dy)).timetuple().tm_yday
  doy.append(yrday)
  doy_mean.append(group.mean())
  doy_size.append(group.count())
  # Note: useful climatology stats are also available via group.describe() returned as dict
  #desc = group.describe()
  # desc["mean"], desc["min"], desc["max"], std,quartiles, etc.

# we lose the counts here.
new_cum_data  = pd.Series(doy_mean, index=doy)
print new_cum_data.ix[366]
>> 634.5

pd_dict = {}
pd_dict["mean"] = doy_mean
pd_dict["size"] = doy_size
cum_data_df = pd.DataFrame(data=pd_dict, index=doy)

print cum_data_df.ix[366]
>> mean    634.5
>> size      4.0
>> Name: 366, dtype: float64
# and just to check Feb 29
print cum_data_df.ix[60]
>> mean    343
>> size      1
>> Name: 60, dtype: float64

- Eric Bridger

你好@user308827，我正在使用你的新代码来计算每日气候。但是如果我想要谐波平均数呢？我使用doy_harmonic_mean.append(group.statistics.harmonic_mean())，但是我得到了一个AttributeError错误：'Series'对象没有'statistics'属性。我该如何在这个代码中应用谐波平均数？ - Javier

0

按月和日分组是一个不错的解决方案。然而，如果您使用 xrray.CFtimeIndex 而不是 pandas.DatetimeIndex，则仍然可以实现 groupby(dayofyear) 的完美思路。即：

通过使用以下方法删除 feb29

rand_data=rand_data[~((rand_data.index.month==2) & (rand_data.index.day==29))]

将上述数据的索引替换为xrray.CFtimeIndex，即：

index = xarray.cftime_range('1950-01-01', '1953-12-31', freq='D', calendar = 'noleap')

index = index[~((index.month==2)&(index.day==29))]

rand_data['time']=index

现在，对于非闰年和闰年来说，第60个dayofyear将是3月1日，dayofyear的总数将为365。使用groupbyyear计算气候日均值是正确的。

- QuanLiu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- joris · Accepted Answer

您可以按照一年中的日期进行分组，然后计算这些组的平均值：

cum_data.groupby(cum_data.index.dayofyear).mean()

然而，你需要注意闰年的问题。这种方法可能会导致问题。作为替代方案，你也可以按月份和日期进行分组：

In [13]: cum_data.groupby([cum_data.index.month, cum_data.index.day]).mean()
Out[13]:
1  1     462.25
   2     631.00
   3     615.50
   4     496.00
...
12  28    378.25
    29    427.75
    30    528.50
    31    678.50
Length: 366, dtype: float64