优化我的Cython/Numpy代码？到目前为止只有30%的性能提升。

Question

优化我的Cython/Numpy代码？到目前为止只有30%的性能提升。

3

我是否有遗漏的步骤可以加快速度？我正在尝试实现《调整音色谱尺度》一书中描述的算法。如果一切都失败了，我是否可以用C语言编写代码的这一部分，然后从Python中调用它？

import numpy as np
cimport numpy as np

# DTYPE = np.float
ctypedef np.float_t DTYPE_t

np.seterr(divide='raise', over='raise', under='ignore', invalid='raise')

"""
I define a timbre as the following 2d numpy array:
[[f0, a0], [f1, a1], [f2, a2]...] where f describes the frequency
of the given partial and a is its amplitude from 0 to 1. Phase is ignored.
"""

#Test Timbre
# cdef np.ndarray[DTYPE_t,ndim=2] t1 = np.array( [[440,1],[880,.5],[(440*3),.333]])

# Calculates the inherent dissonance of one timbres of the above form
# using the diss2Partials function
cdef DTYPE_t diss1Timbre(np.ndarray[DTYPE_t,ndim=2] t):
    cdef DTYPE_t runningDiss1
    runningDiss1 = 0.0
    cdef unsigned int len = np.shape(t)[0]
    cdef unsigned int i
    cdef unsigned int j
    for i from 0 <= i < len:
        for j from i+1 <= j < len:
            runningDiss1 += diss2Partials(t[i], t[j])
    return runningDiss1

# Calculates the dissonance between two timbres of the above form 
cdef DTYPE_t diss2Timbres(np.ndarray[DTYPE_t,ndim=2] t1, np.ndarray[DTYPE_t,ndim=2] t2):
    cdef DTYPE_t runningDiss2
    runningDiss2 = 0.0
    cdef unsigned int len1 = np.shape(t1)[0]
    cdef unsigned int len2 = np.shape(t2)[0]
    runningDiss2 += diss1Timbre(t1)
    runningDiss2 += diss1Timbre(t2)
    cdef unsigned int i1
    cdef unsigned int i2
    for i1 from 0 <= i1 < len1:
        for i2 from 0 <= i2 < len2:
            runningDiss2 += diss2Partials(t1[i1], t2[i2])
    return runningDiss2

cdef inline DTYPE_t float_min(DTYPE_t a, DTYPE_t b): return a if a <= b else b

# Calculates the dissonance of two partials of the form [f,a]
cdef DTYPE_t diss2Partials(np.ndarray[DTYPE_t,ndim=1] p1, np.ndarray[DTYPE_t,ndim=1] p2):
    cdef DTYPE_t f1 = p1[0]
    cdef DTYPE_t f2 = p2[0]
    cdef DTYPE_t a1 = abs(p1[1])
    cdef DTYPE_t a2 = abs(p2[1])

    # In order to insure that f2 > f1:
    if (f2 < f1):
        (f1,f2,a1,a2) = (f2,f1,a2,a1)

    # Constants of the dissonance curves
    cdef DTYPE_t _xStar
    _xStar = 0.24
    cdef DTYPE_t _s1
    _s1 = 0.021
    cdef DTYPE_t _s2
    _s2 = 19
    cdef DTYPE_t _b1
    _b1 = 3.5
    cdef DTYPE_t _b2
    _b2 = 5.75

    cdef DTYPE_t a = float_min(a1,a2)
    cdef DTYPE_t s = _xStar/(_s1*f1 + _s2)
    return (a * (np.exp(-_b1*s*(f2-f1)) - np.exp(-_b2*s*(f2-f1)) ) )

cpdef dissTimbreScale(np.ndarray[DTYPE_t,ndim=2] t,np.ndarray[DTYPE_t,ndim=1] s):
    cdef DTYPE_t currDiss
    currDiss = 0.0;
    cdef unsigned int i
    for i from 0 <= i < s.size:
        currDiss += diss2Timbres(t, transpose(t,s[i]))
    return currDiss

cdef np.ndarray[DTYPE_t,ndim=2] transpose(np.ndarray[DTYPE_t,ndim=2] t, DTYPE_t ratio):
    return np.dot(t, np.array([[ratio,0],[0,1]]))

Link to code: Cython Code

- Chironex

http://docs.cython.org/src/tutorial/external.html 和 http://docs.cython.org/src/tutorial/clibraries.html。 - reve_etrange

不，我只是使用Cython来加速最初用纯Python编写的算法。 - Chironex

如需将Python转换为C，请参见SO 使用Cython包装C代码的简单方法。 - denis

3个回答

0

我建议你对代码进行分析，以查看哪个函数占用了最多的时间。如果是diss2Timbres，你可以考虑使用“numexpr”包。

我曾经比较过一个函数在Python/Cython和Numexpr中的表现（链接到SO）。根据数组的大小，numexpr的性能优于Cython和Fortran。

注意：刚刚发现这篇文章真的很老了...

- Moritz

0

在你的代码中：

for i from 0 <= i < len:
    for j from i+1 <= j < len:
        runningDiss1 += diss2Partials(t[i], t[j])
return runningDiss1

对于每个数组查找都执行边界检查，请在函数之前使用装饰器@cython.boundscheck(False)，然后将i和j强制转换为无符号整数类型以用作索引。查看cython for Numpy tutorial以获取更多信息。

- highBandWidth

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Justin Peel · Accepted Answer

以下是我注意到的一些事情：

在其他地方也使用 t1.shape[0] 而不是 np.shape(t1)[0] 等。
不要将 len 作为变量，因为它是 Python 中内置的函数（不是为了速度，而是为了好的编程实践）。使用 L 或类似的变量名。
除非确实需要，否则不要将两个元素的数组传递给函数。每次传递数组时，Cython 都会检查缓冲区。因此，在使用 diss2Partials(t[i], t[j]) 时，请改用 diss2Partials(t[i,0], t[i,1], t[j,0], t[j,1]) 并适当重新定义 diss2Partials。
不要使用 abs 函数，或者至少不要使用 Python 自带的那个。它需要将 C 的 double 类型转换为 Python 的 float 类型，调用 abs 函数，然后再将结果转换回 C 的 double 类型。最好创建一个内联函数，就像你使用 float_min 一样。
调用 np.exp 做的事情与使用 abs 相似。将 np.exp 改为 exp，并在顶部导入添加 from libc.math cimport exp。
完全删除 transpose 函数。使用矩阵乘法会拖慢速度，但是这里根本没有必要进行矩阵乘法。重写你的 dissTimbreScale 函数以创建一个空矩阵，例如 t2。在当前循环之前，将 t2 的第二列设置为 t 的第二列（最好使用循环，但是你可能可以在这里使用 Numpy 操作）。然后，在当前循环内部，加入一个循环，将 t2 的第一列设置为 t 的第一列乘以 s[i]。那才是你真正做的矩阵乘法。然后，只需将 t2 作为第二个参数传递给 diss2Timbres，而不是传递由 transpose 函数返回的矩阵。

请先完成1-5，因为它们相对容易。第6步可能需要更多时间、精力和实验，但我猜测它也可能会显著提高速度。