简而言之:在Cython中,为什么(或何时)迭代NumPy数组比迭代Python列表更快?
一般来说:
我以前用过Cython,并能够大大加速原始的Python实现,但是弄清楚需要做什么似乎并不容易。
考虑以下三个sum()函数的实现。它们存储在名为'cy'的Cython文件中(显然,有np.sum(),但这与我的观点无关...)
原始Python代码:
def sum_naive(A):
s = 0
for a in A:
s += a
return s
使用预期输入为Python列表的Cython函数:
def sum_list(A):
cdef unsigned long s = 0
for a in A:
s += a
return s
Cython与一个期望numpy数组的函数。
def sum_np(np.ndarray[np.int64_t, ndim=1] A):
cdef unsigned long s = 0
for a in A:
s += a
return s
就运行时间而言,我预期 sum_np < sum_list < sum_naive,然而,以下脚本却证明了相反的情况(为了完整性,我添加了 np.sum())。
(Note: This is the translated text with the same HTML tags and formatting preserved.)N = 1000000
v_np = np.array(range(N))
v_list = range(N)
%timeit cy.sum_naive(v_list)
%timeit cy.sum_naive(v_np)
%timeit cy.sum_list(v_list)
%timeit cy.sum_np(v_np)
%timeit v_np.sum()
结果为:
In [18]: %timeit cyMatching.sum_naive(v_list)
100 loops, best of 3: 18.7 ms per loop
In [19]: %timeit cyMatching.sum_naive(v_np)
1 loops, best of 3: 389 ms per loop
In [20]: %timeit cyMatching.sum_list(v_list)
10 loops, best of 3: 82.9 ms per loop
In [21]: %timeit cyMatching.sum_np(v_np)
1 loops, best of 3: 1.14 s per loop
In [22]: %timeit v_np.sum()
1000 loops, best of 3: 659 us per loop
发生了什么事?为什么Cython + NumPy很慢?
P.S.
我确实使用了
#cython:boundscheck=False
#cython:wraparound=False