Pandas DatetimeIndex TypeError

Question

Pandas DatetimeIndex TypeError

5

我将尝试做类似这个链接中的操作：Pandas resampling with custom volume weighted aggregation，但是我的索引出现了TypeError错误。

我有以下数据：

                         Dates       P   Q
0   2020-09-07 01:20:24.738686  7175.0  21
1   2020-09-07 01:45:27.540590  7150.0   7
2   2020-09-07 03:48:49.120607  7125.0   4
3   2020-09-07 04:45:50.972042  7125.0   6
4   2020-09-07 05:36:23.139612  7125.0   2

我使用print(df.dtypes)来检查类型，返回结果如下：

Dates    datetime64[ns]
P               float64
Q                 int64
dtype: object

我接着使用以下代码将日期设置为索引：

df = df.set_index(pd.DatetimeIndex(df['Dates']))

然后，我删除了日期列以便更容易阅读：

df = df.drop(['Dates'], axis=1)

这样就得到了以下结果：

                                 P   Q
Dates                                 
2020-09-07 01:20:24.738686  7175.0  21
2020-09-07 01:45:27.540590  7150.0   7
2020-09-07 03:48:49.120607  7125.0   4
2020-09-07 04:45:50.972042  7125.0   6
2020-09-07 05:36:23.139612  7125.0   2

我尝试重新采样：

然后我尝试重新采样：

def vwap(data):
    price = data.P
    quantity = data.Q

    top = sum(price * quantity)
    bottom = sum(quantity)

    return top / bottom

df2 = df.resample("5h",axis=1).apply(vwap)

这导致错误TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

查看其他类似命名的堆栈溢出条目，它们的问题大多是日期时间只像日期时间一样，但实际上没有格式化为日期时间。但这里不是这种情况，因为我们可以在早期看到Dates列的类型为datetime64 [ns]

此外，如果我执行print(df.index.dtype)，我会得到：

datetime64[ns]

有什么建议吗？如果需要的话，我很乐意澄清任何事情或提供更多代码。

- F1rools22

是因为你在 axis=1 上重新采样，去掉那个参数就可以了。 - Erfan

@Erfan 这样可以让重新采样开始尝试工作，但现在出现了错误 'AttributeError: 'Series' object has no attribute 'P''。也许我对重新采样的工作方式有误解，但当我执行 print(data) 时，我得到的是：

2020-09-07 01:20:24.738686    7175.0 2020-09-07 01:45:27.540590    7150.0 2020-09-07 03:48:49.120607    7125.0 2020-09-07 04:45:50.972042    7125.0 Name: P, dtype: float64

。人们会想象我需要数据行（我相信这就是 axis, 1 给我的东西），而不是列（我相信没有声明轴时我得到的就是列）。 - F1rools22

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Erfan · Accepted Answer

去掉axis=1参数，使用pd.Grouper即可:

df.groupby(pd.Grouper(freq="5h")).apply(vwap)

Dates
2020-09-07 00:00:00    7157.236842
2020-09-07 05:00:00    7125.000000
dtype: float64

如果您想要一个带有信息性列名的数据框，请使用reset_index：

df.groupby(pd.Grouper(freq="5h")).apply(vwap).reset_index(name="vwap")

                Dates         vwap
0 2020-09-07 00:00:00  7157.236842
1 2020-09-07 05:00:00  7125.000000