Erik Rigtorp发布了一个关于使用NumPy进行高效滚动统计的技巧:
A loop in Python are however very slow compared to a loop in C code.
Fortunately there is a trick to make NumPy perform this looping
internally in C code. This is achieved by adding an extra dimension
with the same size as the window and an appropriate stride:
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
使用此函数,您可以做到以下事情:
winlen = 5
values = np.array([160, 140, 152, 142, 143, 186, 152, 145, 165, 152, 143, 148, 196, 152, 145, 157, 152])
rolling_values = rolling_window(values, winlen + 1)
rolling_indices = np.arange(winlen, values.shape[0])
mask = np.all(rolling_values[:, [-1]] > rolling_values[:, :-1], axis=1)
indices = rolling_indices[mask]
print(indices)
说明:
rolling_window
将值转换为以下形式的数组:
print(rolling_values)
array([[160, 140, 152, 142, 143, 186],
[140, 152, 142, 143, 186, 152],
[152, 142, 143, 186, 152, 145],
[142, 143, 186, 152, 145, 165],
[143, 186, 152, 145, 165, 152],
[186, 152, 145, 165, 152, 143],
[152, 145, 165, 152, 143, 148],
[145, 165, 152, 143, 148, 196],
[165, 152, 143, 148, 196, 152],
[152, 143, 148, 196, 152, 145],
[143, 148, 196, 152, 145, 157],
[148, 196, 152, 145, 157, 152]])
每行包含一个元素(从第六个元素开始)和前面的五个元素。由于步幅技巧,这种表示不需要比原始数组更多的内存。
现在,我们可以比较每行中最后一个元素是否大于前面的元素,并查找相应的索引。