如何在 Pandas 中对累积操作进行向量化

Question

如何在 Pandas 中对累积操作进行向量化

6

是否有一种方法可以矢量化“期末值”（VEoP）列？

import pandas as pd

terms = pd.date_range(start = '2022-01-01', periods=12, freq='YS', normalize=True)
df = pd.DataFrame({
    'Return':   [1.063, 1.053, 1.008, 0.98, 1.04, 1.057, 1.073, 1.027, 1.025, 1.068, 1.001, 0.983],
    'Cashflow': [6, 0, 0, 8, -1, -1, -1, -1, -1, -1, -1, -1]
    },index=terms.strftime('%Y'))
df.index.name = 'Date'

df['VEoP'] = 0
for y in range(0, df.index.size):
    df['VEoP'].iloc[y] = ((0 if y==0 else df['VEoP'].iloc[y-1]) + df['Cashflow'].iloc[y]) * df['Return'].iloc[y]

df

    Return  Cashflow    VEoP
Date                          
2022  1.0630         6  6.3780
2023  1.0530         0  6.7160
2024  1.0080         0  6.7698
2025  0.9800         8 14.4744
2026  1.0400        -1 14.0133
2027  1.0570        -1 13.7551
2028  1.0730        -1 13.6862
2029  1.0270        -1 13.0288
2030  1.0250        -1 12.3295
2031  1.0680        -1 12.0999
2032  1.0010        -1 11.1110
2033  0.9830        -1  9.9391

- Ralf Klüber

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- EliadL · Accepted Answer

当每个值依赖于其前一个值时，向量化受到限制，因为它无法并行化。

因此，可以使用accumulate实现非向量化的解决方案：

df['VEoP'] = list(accumulate(
    df.to_records(),
    lambda prev_veop, new: (prev_veop + new.Cashflow) * new.Return,
    initial=0,
))[1:]

这个函数的性能与numpy的“矢量化”相当：

df['VEoP'] = np.frompyfunc(
    lambda prev_veop, new: (prev_veop + new.Cashflow) * new.Return,
    2, 1,  # nin, nout
).accumulate(
    [0, *df.to_records()],
    dtype=object,  # temporary conversion
).astype(float)[1:]

可以分解成更小的逻辑块：

def get_ufunc(func, nin, nout):  return np.frompyfunc(func, nin, nout)
def get_binary_ufunc(func):      return get_ufunc(func, nin=2, nout=1)
def accum(func):                 return get_binary_ufunc(func).accumulate
def accum_float(func, x):        return accum(func)(x, dtype=object).astype(float)
def accum_float_from_0(func, x): return accum_float(func, [0, *x])[1:]

def calc_veop(prev_veop, new):   return (prev_veop + new.Cashflow) * new.Return
def accum_veop(records):         return accum_float_from_0(calc_veop, records)

df['VEoP'] = accum_veop(df.to_records())

你可以阅读有关 np.frompyfunc 和 np.ufunc.accumulate 的更多信息。