优化Cython代码以计算numpy方差

Question

优化Cython代码以计算numpy方差

3

我正在尝试优化我的Cython代码，这里似乎有很大的改进空间。以下是IPython笔记本中%prun扩展的部分概要：

 7016695 function calls in 18.475 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   400722    7.723    0.000   15.086    0.000 _methods.py:73(_var)
   814815    4.190    0.000    4.190    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    1.855    1.855   18.475   18.475 {_cython_magic_aed83b9d1a706200aa6cef0b7577cf41.knn_alg}
   403683    0.838    0.000    1.047    0.000 _methods.py:39(_count_reduce_items)
   813031    0.782    0.000    0.782    0.000 {numpy.core.multiarray.array}
   398748    0.611    0.000   15.485    0.000 fromnumeric.py:2819(var)
   804405    0.556    0.000    1.327    0.000 numeric.py:462(asanyarray)

由于我的程序花费了近8秒钟来计算方差，我希望能够加快速度。

我正在使用np.var（）计算长度为404的1D数组的方差，大约重复了1000次。我检查了C标准库，不幸的是没有这个函数，而且我也不想用C编写自己的函数。

1.还有其他选项吗？

2.有什么方法可以减少在列表第二项中花费的时间吗？

如果需要，以下是我的代码：

cpdef knn_alg(np.ndarray[double, ndim=2] temp, np.ndarray[double, ndim=1] jan1, int L, int w, int B):

cdef np.ndarray[double, ndim=3] lnn = np.zeros((L+1,temp.shape[1],365))

lnn = lnn_alg(temp, L, w)

cdef np.ndarray[double, ndim=2] sim = np.zeros((len(temp),temp.shape[1]))
cdef np.ndarray [double, ndim=2] a = np.zeros((L+1,lnn.shape[1]))
cdef int b
cdef np.ndarray [double, ndim=2] c = np.zeros((L,lnn.shape[1]-3))
cdef np.ndarray [double, ndim=2] lnn_scale = np.zeros((L,lnn.shape[1]))
cdef np.ndarray [double, ndim=2] cov_t = np.zeros((3,3))   
cdef np.ndarray [double, ndim=2] dk = np.zeros((L,4))
cdef int random_selection
cdef np.ndarray [double, ndim=1] day_month
cdef int day_of_year
cdef np.ndarray [double, ndim=2] lnn_scaled
cdef np.ndarray [double, ndim=2] temp_scaled
cdef np.ndarray [double, ndim=2] eig_vec
cdef double PC_t
cdef np.ndarray [double, ndim=1] PC_l
cdef double K 
cdef np.ndarray[double, ndim=2] knn
cdef np.ndarray[double, ndim=1] val
cdef np.ndarray[double, ndim=1] pn
cdef double rand_num
cdef int nn
cdef int index
cdef int inc
cdef int i 

sim[0,:] = jan1

for i in xrange(1,len(temp),B):

    #If leap day then randomly select feb 28 or mar 31
    if (temp[i,4]==2) & (temp[i,3]==29):
        random_selection = np.random.randint(0,1)
        day_month = np.array([[29,2],[1,3]])[random_selection]
    else:
        day_month = temp[i,3:5]

    #Convert day month to day of year for L+1 nearest neighbors selection
    current = datetime.datetime(2014, (<int>day_month[1]), (<int>day_month[0]))
    day_of_year = current.timetuple().tm_yday - 1

    #Take out current day from L+1 nearest neighbors
    a = lnn[:,:,day_of_year]
    b = np.where((a[:,3:6] == temp[i,3:6]).all(axis=-1))[0][0]
    c = np.delete(a,(b), axis=0)

    #Scale and center data from nearest neighbors and spatially averaged historical data
    lnn_scaled = scale(c[:,0:3])
    temp_scaled = scale(temp[:,0:3])

    #Calculate covariance matrix of nearest neighbors
    cov_t[:,:] = np.cov(lnn_scaled.T)

    #Calculate eigenvalues and vectors of covariance matrix
    eig_vec = eig(cov_t)[1]

    #Calculate principal components of scaled L nearest neighbors and 
    PC_t = np.dot(temp_scaled[i],eig_vec[0])
    PC_l = np.dot(lnn_scaled,eig_vec[0])

    #Calculate mahalonobis distance
    dk = np.zeros((404,4))
    dk[:,0] = np.array([sqrt((PC_t-pc)**2/np.var(PC_l)) for pc in PC_l])
    dk[:,1:4] = c[:,3:6]

    #Extract K nearest neighbors
    dk = dk[dk[:,0].argsort()]
    K = round(sqrt(L),0)
    knn = dk[0:(<int>K)]

    #Create probility density function
    val = np.array([1.0/k for k in range(1,len(knn)+1)])
    wk = val/(<int>val.sum())
    pn = wk.cumsum()

    #Select next days value from KNNs using probability density function with random value
    rand_num = np.random.rand(1)[0]
    nn = (abs(pn-rand_num)).argmin()
    index = np.where((temp[:,3:6] == knn[nn,1:4]).all(axis=-1))[0][0]

    if i+B > len(temp):
        inc = len(temp) - i
    else:
        inc = B

    if (index+B > len(temp)):
        index = len(temp)-B

    sim[i:i+inc,:] = temp[index:index+inc,:]    

return sim

方差计算在这一行：

 dk[:,0] = np.array([sqrt((PC_t-pc)**2/np.var(PC_l)) for pc in PC_l])

任何建议都会非常有帮助，因为我对cython还比较陌生。

- pbreach

2

你可以使用 cython -a file.pyx 进行编译，以查看瓶颈在哪里（在黄色标记），它会测量调用了多少 Python API... - Saullo G. P. Castro

1

我在IPython笔记本中使用相同的方法，只不过对于特定的代码单元格，它是%%cython -a。 - pbreach

2个回答

1

为确保你的for循环被正确转换，请查看生成的C代码。可参考这个链接查看Cython文档。

如果还不行，可能需要确保将pc声明为cdef类型，以确保没有引用任何Python对象。另外一个链接到文档的链接。

dk[:,0] = np.array([sqrt((PC_t-pc)**2/np.var(PC_l)) for pc in PC_l])

- Ryan

我尝试将pc定义为double类型，并且正常编写for循环而不是使用列表推导式并将迭代器定义为int。在这两种情况下都没有明显的加速，我想我必须像你提到的那样查看C代码。 - pbreach

阅读了一些文档后，我认为问题可能源于在循环中使用datetime和pandas对象，不允许整个循环被编译成C。此外，我注意到这个循环的运行速度与纯Python等效的速度相同，这可能支持这个理论。 - pbreach

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- pbreach · Accepted Answer

我来翻译一下吧。

我查看了这个计算过程，我认为它运行缓慢的原因是我使用了np.var()函数，这是Python（或numpy）函数，不允许循环在C中编译。如果有人知道如何在使用numpy的情况下解决这个问题，请告诉我。

最终我做的是对这个计算重新编码：

dk[:,0] = np.array([sqrt((PC_t-pc)**2/np.var(PC_l)) for pc in PC_l])

将其作为独立函数处理：

cimport cython
cimport numpy as np
import numpy as np
from libc.math cimport sqrt as csqrt
from libc.math cimport pow as cpow
@cython.boundscheck(False)
@cython.cdivision(True)

cdef cy_mahalanobis(np.ndarray[double, ndim=1] PC_l, double PC_t):
    cdef unsigned int i,j,L
    L = PC_l.shape[0]
    cdef np.ndarray[double] dk = np.zeros(L)
    cdef double x,total,mean,var


    total = 0
    for i in xrange(L):
        x = PC_l[i]
        total = total + x

    mean = total / L
    total = 0
    for i in xrange(L):
        x = cpow(PC_l[i]-mean,2)
        total = total + x

    var = total / L

    for j in xrange(L):
        dk[j] = csqrt(cpow(PC_t-PC_l[j],2)/var)

    return dk

由于我没有调用任何Python函数（包括numpy），因此整个循环可以在C中编译（使用annotate选项cython -a file.pyx或%%cython -a在IPython笔记本中时不会出现黄色线）。

总体而言，我的代码速度提升了一个数量级！手工编写这段代码的努力非常值得！我的Cython（以及Python）并不是最好的，因此希望能得到更多建议或答案。