使用自定义函数在 pandas 上计算每日聚合数据

4
我是一名有用的助手,可以为您进行文本翻译。以下是您需要翻译的内容:

我刚接触Pandas,并且作为练习,我正在移动一些旧代码/解决方案以便学习。在这种情况下,我尝试计算产品价格的综合指数,之前是在SQL中计算的。

这是数据框中的数据:

    id  weight      date        price
0   1   0.002796    2005-11-15  0.998298
1   1   0.002796    2005-11-16  1.014242
2   1   0.002796    2005-11-17  1.016452
3   1   0.002796    2005-11-18  1.026396
4   1   0.002796    2005-11-19  1.026047
5   1   0.002796    2005-11-20  1.024285
6   1   0.002796    2005-11-21  1.018764
7   1   0.002796    2005-11-22  1.033175
8   1   0.002796    2005-11-23  1.058509
9   1   0.002796    2005-11-24  1.061231
10  1   0.002796    2005-11-25  1.058137
11  1   0.002796    2005-11-26  0.999380
12  1   0.002796    2005-11-27  0.990504
13  1   0.002796    2005-11-28  0.993764
14  1   0.002796    2005-11-29  0.978754
15  1   0.002796    2005-11-30  0.992070
...     ...     ...     ...     ...
4085    1   0.002796    2017-01-21  0.857420
4086    1   0.002796    2017-01-22  0.848195
4087    1   0.002796    2017-01-23  0.791784
4088    1   0.002796    2017-01-24  0.846603
4089    1   0.002796    2017-01-25  0.878104
4090    1   0.002796    2017-01-26  0.806651
4091    1   0.002796    2017-01-27  0.849316
4092    1   0.002796    2017-01-28  0.826550
4093    1   0.002796    2017-01-29  0.848651
4094    1   0.002796    2017-01-30  0.829643
4095    1   0.002796    2017-01-31  0.837094
4096    1   0.002796    2017-02-01  0.846572
4097    1   0.002796    2017-02-02  0.800163
4098    1   0.002796    2017-02-03  0.820356
4099    1   0.002796    2017-02-04  0.818924
4100    1   0.002796    2017-02-05  0.822157
4101    1   0.002796    2017-02-06  0.787123
4102    1   0.002796    2017-02-07  0.796264
4103    1   0.002796    2017-02-08  0.797241
4104    1   0.002796    2017-02-09  0.818499
4105    1   0.002796    2017-02-10  0.810928

合成指数是使用每日回报率计算的,其中当天的回报率为:

Rt =(Price_day / Price_day_before)- 1

我一直在阅读有关pandas、时间序列等方面的内容,但我很难理解这里要执行的具体操作;这是一个滚动操作吗?如何获取给定日期和前一日期的数据?

3个回答

3

代码如下:

df['return'] = df['price'] / df['price'].shift(1) - 1

在 Python 中,df.shift(1) 函数相当于 SQL 中的 LEAD()LEG() 函数。


2

如果我理解正确,您想使用 pct_change() 方法:

In [196]: x
Out[196]:
    id    weight        date     price
0    1  0.002796  2005-11-15  0.998298
1    1  0.002796  2005-11-16  1.014242
2    1  0.002796  2005-11-17  1.016452
3    1  0.002796  2005-11-18  1.026396
4    1  0.002796  2005-11-19  1.026047
5    1  0.002796  2005-11-20  1.024285
6    1  0.002796  2005-11-21  1.018764
7    1  0.002796  2005-11-22  1.033175
8    1  0.002796  2005-11-23  1.058509
9    1  0.002796  2005-11-24  1.061231
10   1  0.002796  2005-11-25  1.058137
11   1  0.002796  2005-11-26  0.999380
12   1  0.002796  2005-11-27  0.990504
13   1  0.002796  2005-11-28  0.993764
14   1  0.002796  2005-11-29  0.978754
15   1  0.002796  2005-11-30  0.992070

In [197]: x['Rt'] = x['price'].pct_change()

In [198]: x
Out[198]:
    id    weight        date     price        Rt
0    1  0.002796  2005-11-15  0.998298       NaN
1    1  0.002796  2005-11-16  1.014242  0.015971
2    1  0.002796  2005-11-17  1.016452  0.002179
3    1  0.002796  2005-11-18  1.026396  0.009783
4    1  0.002796  2005-11-19  1.026047 -0.000340
5    1  0.002796  2005-11-20  1.024285 -0.001717
6    1  0.002796  2005-11-21  1.018764 -0.005390
7    1  0.002796  2005-11-22  1.033175  0.014146
8    1  0.002796  2005-11-23  1.058509  0.024521
9    1  0.002796  2005-11-24  1.061231  0.002572
10   1  0.002796  2005-11-25  1.058137 -0.002915
11   1  0.002796  2005-11-26  0.999380 -0.055529
12   1  0.002796  2005-11-27  0.990504 -0.008882
13   1  0.002796  2005-11-28  0.993764  0.003291
14   1  0.002796  2005-11-29  0.978754 -0.015104
15   1  0.002796  2005-11-30  0.992070  0.013605

另一种解决方案(使用 shift() 方法):

In [199]: x['Rt2'] = x['price'] / x['price'].shift() - 1

In [200]: x
Out[200]:
    id    weight        date     price        Rt       Rt2
0    1  0.002796  2005-11-15  0.998298       NaN       NaN
1    1  0.002796  2005-11-16  1.014242  0.015971  0.015971
2    1  0.002796  2005-11-17  1.016452  0.002179  0.002179
3    1  0.002796  2005-11-18  1.026396  0.009783  0.009783
4    1  0.002796  2005-11-19  1.026047 -0.000340 -0.000340
5    1  0.002796  2005-11-20  1.024285 -0.001717 -0.001717
6    1  0.002796  2005-11-21  1.018764 -0.005390 -0.005390
7    1  0.002796  2005-11-22  1.033175  0.014146  0.014146
8    1  0.002796  2005-11-23  1.058509  0.024521  0.024521
9    1  0.002796  2005-11-24  1.061231  0.002572  0.002572
10   1  0.002796  2005-11-25  1.058137 -0.002915 -0.002915
11   1  0.002796  2005-11-26  0.999380 -0.055529 -0.055529
12   1  0.002796  2005-11-27  0.990504 -0.008882 -0.008882
13   1  0.002796  2005-11-28  0.993764  0.003291  0.003291
14   1  0.002796  2005-11-29  0.978754 -0.015104 -0.015104
15   1  0.002796  2005-11-30  0.992070  0.013605  0.013605

哇,我甚至不知道pct_change()这个函数。文档并没有详细说明计算方法,但结果与操作本身是一致的。 - Alberto
一旦您添加了shift(),我将为您提供最佳答案,这正是我所考虑的操作类型。太棒了! - Alberto
@mydaemon,很高兴我能帮到你 :) - MaxU - stand with Ukraine

2
100 开始生成一个合成指数。
df['PrcRtnIdx'] = df.price.pct_change().fillna(0).add(1).cumprod().mul(100)

print(df)

    id    weight        date     price   PrcRtnIdx
0    1  0.002796  2005-11-15  0.998298  100.000000
1    1  0.002796  2005-11-16  1.014242  101.597118
2    1  0.002796  2005-11-17  1.016452  101.818495
3    1  0.002796  2005-11-18  1.026396  102.814590
4    1  0.002796  2005-11-19  1.026047  102.779631
5    1  0.002796  2005-11-20  1.024285  102.603131
6    1  0.002796  2005-11-21  1.018764  102.050089
7    1  0.002796  2005-11-22  1.033175  103.493646
8    1  0.002796  2005-11-23  1.058509  106.031365
9    1  0.002796  2005-11-24  1.061231  106.304029
10   1  0.002796  2005-11-25  1.058137  105.994102
11   1  0.002796  2005-11-26  0.999380  100.108384
12   1  0.002796  2005-11-27  0.990504   99.219271
13   1  0.002796  2005-11-28  0.993764   99.545827
14   1  0.002796  2005-11-29  0.978754   98.042268
15   1  0.002796  2005-11-30  0.992070   99.376138

回复@mydaemon的内容

df['PrcRtnIdx'] = \
    df.groupby('id').price.apply(
        lambda x: x.pct_change().fillna(0).add(1).cumprod().mul(100))

先生,您让我大吃一惊 :-O。这是否意味着针对P0=100的每个合成指数都进行了整个计算?那么每个产品ID呢? - Alberto

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接