我有一个包含不规则间隔和时区信息的DatetimeIndex的DataField,还有两个值列:
In: df.head()
Out:
v1 v2
2014-01-18 00:00:00.842537+01:00 130107 7958
2014-01-18 00:00:00.858443+01:00 130251 7958
2014-01-18 00:00:00.874054+01:00 130476 7958
2014-01-18 00:00:00.889617+01:00 130250 7958
2014-01-18 00:00:00.905163+01:00 130327 7958
In: df.index
Out:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-18 00:00:00.842537984, ..., 2014-01-18 00:10:00.829031936]
Length: 38558, Freq: None, Timezone: Europe/Berlin
如果我按任何频率重新采样此数据字段,则时区将保留:
In : df_3.resample('1S', 'mean',).head()
Out:
v1 v2
2014-01-18 00:00:00+01:00 130311.090909 7958.000000
2014-01-18 00:00:01+01:00 130385.125000 7958.000000
2014-01-18 00:00:02+01:00 130332.593750 7957.000000
2014-01-18 00:00:03+01:00 130377.061538 7957.307692
2014-01-18 00:00:04+01:00 130384.171875 7957.640625
当引入任何"loffset"时,时间戳会额外向前负移一小时:
In : df_3.resample('1S', 'mean', loffset='1S').head()
Out:
v1 v2
2014-01-17 23:00:01+01:00 130311.090909 7958.000000
2014-01-17 23:00:02+01:00 130385.125000 7958.000000
2014-01-17 23:00:03+01:00 130332.593750 7957.000000
2014-01-17 23:00:04+01:00 130377.061538 7957.307692
2014-01-17 23:00:05+01:00 130384.171875 7957.640625
即使专门指定了“空”偏移量:
In : df_3.resample('1S', 'mean', loffset='0S').head()
Out:
v1 v2
2014-01-17 23:00:01+01:00 130311.090909 7958.000000
2014-01-17 23:00:02+01:00 130385.125000 7958.000000
2014-01-17 23:00:03+01:00 130332.593750 7957.000000
2014-01-17 23:00:04+01:00 130377.061538 7957.307692
2014-01-17 23:00:05+01:00 130384.171875 7957.640625
为了保持正确的时间戳,我必须将这一小时添加到偏移量中:
In : df_3.resample('1S', 'mean', loffset='1H1S').head()
Out:
v1 v2
2014-01-18 00:00:01+01:00 130311.090909 7958.000000
2014-01-18 00:00:02+01:00 130385.125000 7958.000000
2014-01-18 00:00:03+01:00 130332.593750 7957.000000
2014-01-18 00:00:04+01:00 130377.061538 7957.307692
2014-01-18 00:00:05+01:00 130384.171875 7957.640625
为什么会这样?我是不是漏了什么东西?