Pandas 滚动应用函数性能较慢。

Question

Pandas 滚动应用函数性能较慢。

3

需要翻译的内容：

涉及到的源代码为：

import numpy as np
dd=lambda x: np.nanmax(1.0 - x / np.fmax.accumulate(x))
df.rolling(window=period, min_periods=1).apply(dd)

执行以上两行代码需要很长时间，尤其是在最新的pandas版本（1.4.0）下。数据框只有3000行和2000列。之前版本的pandas（0.23.x）执行相同的代码速度更快。我已经尝试了其他建议和问题，比如pandas groupby/apply的性能缓慢，但都没有太大帮助。 “period”是一个整数变量，值为250。

- jagpreet

由于在较旧的Pandas版本中似乎运行方式不同，您是否已经在他们的Github上发布了此问题？ - FlyingTeller

@FlyingTeller。还没有。我甚至不知道你所指的工作差异是什么。 - jagpreet

1

关于你的这句话“与之前的pandas版本（0.23.x）相比，相同的代码提供了更快的结果。”。在我看来，这听起来像是你没有做错任何事情，而是新的pandas版本引入了使其变慢的更改。 - FlyingTeller

谢谢@FlyingTeller，我已经在提供的链接上提出了这个问题。 - jagpreet

“much faster” 是什么意思？你有测量时间吗？你可以展示时间结果。 - furas

2个回答

0

看看parallel-pandas库。借助它，您可以并行化滑动窗口的apply方法。感谢Michael Szczesny提供的dd_numba函数。我考虑了您需要的数据框大小。

import pandas as pd
import numpy as np
from time import monotonic
from parallel_pandas import ParallelPandas


def dd_numba(x):
    res = np.empty_like(x)
    res[0] = x[0]
    for i in range(1, len(res)):
        if res[i - 1] > x[i] or np.isnan(x[i]):
            res[i] = res[i - 1]
        else:
            res[i] = x[i]
    return np.nanmax(1.0 - x / res)


if __name__ == '__main__':
    # initialize parallel-pandas
    ParallelPandas.initialize(n_cpu=4, split_factor=1)
    df = pd.DataFrame(np.random.rand(3000, 2000))
    period = 250
    dd = lambda x: np.nanmax(1.0 - x / np.fmax.accumulate(x))

    start = monotonic()
    res = df.rolling(window=period, min_periods=1).apply(dd)
    print(f'synchronous time took: {monotonic() - start:.1f} s.')

    start = monotonic()
    res = df.rolling(window=period, min_periods=1).apply(dd, raw=True)
    print(f'with raw=True time took: {monotonic() - start:.1f} s.')

    start = monotonic()
    res = df.rolling(window=period, min_periods=1).apply(dd_numba, raw=True, engine='numba')
    print(f'numba engine time took: {monotonic() - start:.1f} s.')

    start = monotonic()
    res = df.rolling(window=period, min_periods=1).p_apply(dd, raw=True)
    print(f'parallel with raw=True time took: {monotonic() - start:.1f} s.')
    start = monotonic()
    res = df.rolling(window=period, min_periods=1).p_apply(dd_numba,  raw=True, engine='numba')
    print(f'parallel with raw=True and numba time took: {monotonic() - start:.1f} s.')

Output:
synchronous time took: 994.6 s.
with raw=True time took: 48.6 s.
numba engine time took: 9.8 s.
parallel with raw=True time took: 13.5 s.
parallel with raw=True and numba time took: 1.5 s.

994/1.5 ~ 662.6 倍加速。

- padu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Michael Szczesny · Accepted Answer

这些并不是解决方案，多数是像示例函数这样简单情况的权宜之计。但它证实了 df.rolling.apply 的处理速度远非最优。

出于明显的原因，使用一个小得多的数据集。

import pandas as pd
import numpy as np

df = pd.DataFrame(
    np.random.rand(200,100)
)
period = 10
res = [0,0]

使用 pandas v1.3.5 运行时间

%%timeit -n1 -r1
dd=lambda x: np.nanmax(1.0 - x / np.fmax.accumulate(x))
res[0] = df.rolling(window=period, min_periods=1).apply(dd)
# 1 loop, best of 1: 8.72 s per loop

与 numpy 实现相比

from numpy.lib.stride_tricks import sliding_window_view as window

%%timeit
x = window(np.vstack([np.full((period-1,df.shape[1]), np.nan),df.to_numpy()]), period, axis=0)
res[1] = np.nanmax(1.0 - x / np.fmax.accumulate(x, axis=-1), axis=-1)
# 100 loops, best of 5: 3.39 ms per loop

np.testing.assert_allclose(res[0], res[1])

8.72*1000 / 3.39 = 2572.27 倍加速。

分块处理列

l = []
for arr in np.array_split(df.to_numpy(), 100, 1):
    x = window(np.vstack([np.full((period-1,arr.shape[1]), np.nan),arr]), period, axis=0)
    l.append(np.nanmax(1.0 - x / np.fmax.accumulate(x, axis=-1), axis=-1))
res[1] = np.hstack(l)
# 1 loop, best of 5: 9.15 s per loop for df.shape (2000,2000)

使用 `pandas` `numba` 引擎

我们可以通过 pandas 支持 numba jitted 函数来获得更快的速度。不幸的是，numba v0.55.1 无法编译 ufunc.accumulate。我们必须编写自己的实现 np.fmax.accumulate（我的实现不能保证正确性）。请注意，第一次调用较慢，因为需要编译函数。

def dd_numba(x):
    res = np.empty_like(x)
    res[0] = x[0]
    for i in range(1, len(res)):
        if res[i-1] > x[i] or np.isnan(x[i]):
            res[i] = res[i-1]
        else:
            res[i] = x[i]
    return np.nanmax(1.0 - x / res)

df.rolling(window=period, min_periods=1).apply(dd_numba, engine='numba', raw=True)

我们可以使用熟悉的pandas接口，它比我分块的numpy方法在df.shape（2000,2000）上快大约1.16倍。

Pandas 滚动应用函数性能较慢。

分块处理列

使用 pandas numba 引擎

使用 `pandas` `numba` 引擎