使用loffset对pandas DataFrame进行重新采样会引入一个额外的偏移量，为一小时。

Question

使用loffset对pandas DataFrame进行重新采样会引入一个额外的偏移量，为一小时。

5

我有一个包含不规则间隔和时区信息的DatetimeIndex的DataField，还有两个值列：

In:  df.head()
Out: 
                                      v1    v2
2014-01-18 00:00:00.842537+01:00  130107  7958
2014-01-18 00:00:00.858443+01:00  130251  7958
2014-01-18 00:00:00.874054+01:00  130476  7958
2014-01-18 00:00:00.889617+01:00  130250  7958
2014-01-18 00:00:00.905163+01:00  130327  7958

In:  df.index
Out:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-18 00:00:00.842537984, ..., 2014-01-18 00:10:00.829031936]
Length: 38558, Freq: None, Timezone: Europe/Berlin

如果我按任何频率重新采样此数据字段，则时区将保留：

In : df_3.resample('1S', 'mean',).head()
Out: 
                                      v1           v2
2014-01-18 00:00:00+01:00  130311.090909  7958.000000
2014-01-18 00:00:01+01:00  130385.125000  7958.000000
2014-01-18 00:00:02+01:00  130332.593750  7957.000000
2014-01-18 00:00:03+01:00  130377.061538  7957.307692
2014-01-18 00:00:04+01:00  130384.171875  7957.640625

当引入任何"loffset"时，时间戳会额外向前负移一小时：

In : df_3.resample('1S', 'mean', loffset='1S').head()
Out: 
                                      v1           v2
2014-01-17 23:00:01+01:00  130311.090909  7958.000000
2014-01-17 23:00:02+01:00  130385.125000  7958.000000
2014-01-17 23:00:03+01:00  130332.593750  7957.000000
2014-01-17 23:00:04+01:00  130377.061538  7957.307692
2014-01-17 23:00:05+01:00  130384.171875  7957.640625

即使专门指定了“空”偏移量：

In : df_3.resample('1S', 'mean', loffset='0S').head()
Out: 
                                      v1           v2
2014-01-17 23:00:01+01:00  130311.090909  7958.000000
2014-01-17 23:00:02+01:00  130385.125000  7958.000000
2014-01-17 23:00:03+01:00  130332.593750  7957.000000
2014-01-17 23:00:04+01:00  130377.061538  7957.307692
2014-01-17 23:00:05+01:00  130384.171875  7957.640625

为了保持正确的时间戳，我必须将这一小时添加到偏移量中：

In : df_3.resample('1S', 'mean', loffset='1H1S').head()
Out: 
                                      v1           v2
2014-01-18 00:00:01+01:00  130311.090909  7958.000000
2014-01-18 00:00:02+01:00  130385.125000  7958.000000
2014-01-18 00:00:03+01:00  130332.593750  7957.000000
2014-01-18 00:00:04+01:00  130377.061538  7957.307692
2014-01-18 00:00:05+01:00  130384.171875  7957.640625

为什么会这样？我是不是漏了什么东西？

- Julius Bullinger

1

目前关于重采样和额外分箱方面还存在一些未解决的问题：https://github.com/pydata/pandas/issues/4197，如果您愿意进行调查并尝试找出（或更好地修复）问题，我们将不胜感激！您可以直接在该问题上发表评论。 - Jeff

3

这个问题似乎不是一个问题，而是一个错误报告，因此应该作为Github问题提交：https://github.com/pydata/pandas/issues - Andy Hayden

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Julius Bullinger · Accepted Answer

为了回答自己的问题，因为它仍然经常被访问：实际上这是一个错误，在0.16版本中已经修复。