如何在Cython中动态声明2D C数组

Question

如何在Cython中动态声明2D C数组

10

我需要使用各种大小的2D numpy数组执行大量工作，并希望将这些计算转移到cython中。我的想法是，我的2D numpy数组将从python传递到cython，然后转换为c数组或内存视图，并在一系列其他c级函数中使用以进行计算。

经过一些分析，由于一些严重的python开销，我排除了在cython中使用numpy数组的可能性。使用内存视图速度要快得多，而且非常容易使用，但我认为使用c数组可以进一步提高速度。

这里有个问题 - 如何在cython中声明一个2D c数组，而不预定义其具有设置值的维度?例如，我可以用以下方式从numpy创建一个c数组：

narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=np.dtype("i"))

cdef int c_arr[3][4]:
for i in range(3):
    for j in range(4):
        c_arr[i][j] = narr[i][j]

然后将它传递给一个函数：

cdef void somefunction(int c_Arr[3][4]):
    ...

但这意味着我只能使用固定大小的数组，在我的情况下是无用的。因此，我尝试了以下代码：

narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=np.dtype("i"))

cdef int a = np.shape(narr)[0]
cdef int b = np.shape(narr)[1]

cdef int c_arr[a][b]:               # INCORRECT - EXAMPLE ONLY

for i in range(a):
    for j in range(b):
        c_arr[i][j] = narr[i][j]

带着将其传递给像这样的函数的意图：

cdef void somefunction(int a, int b, int c_Arr[a][b]):
    ...

但是它不起作用，编译失败并出现错误“在常量表达式中不允许”。我怀疑我需要以某种方式使用malloc/free？我查看了这个问题（如何在Cython中声明2D列表），但它没有提供我问题的答案。

更新：

事实证明，如果确保在cython中的memory-views关闭了indexError检查，则它们可以像c-arrays一样快，这可以通过使用cython编译器指令来完成：

# cython: boundscheck=False

感谢 @Veedrac 提供的提示！

- phil_thy

1

还要注意，每当你看到像“2700倍更快”这样的愚蠢数字时，那是因为其中一个被完全优化掉了。如果你真正使用输出，好处会小得多。 - Veedrac

啊，是的，好观点！此外，使用浮点数而不是整数也会产生影响... - phil_thy

除了“# cython：boundscheck = False”之外，您还可以通过“# cython：cython.wraparound = False”来进一步加速。 - Dane Lee

2个回答

1

经过 @Veedrac 无价的帮助（非常感谢！），我终于编写出了一段脚本，演示了在Cython中使用内存视图和C数组加速计算。它们都可以达到相似的速度，所以我个人认为使用内存视图要容易得多。

以下是一个示例Cython脚本，可以“接受”一个numpy数组并将其转换为内存视图或C数组，然后通过C级函数执行简单的数组求和：

# cython: boundscheck=False

cimport cython
import numpy as np
cimport numpy as np

from numpy import int32
from numpy cimport int32_t


#Generate numpy array:
narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]], dtype=np.dtype("i"))

cdef int a = np.shape(narr)[0]
cdef int b = np.shape(narr)[1]
cdef int i, j

testsum = np.sum(narr)
print "Test summation: np.sum(narr) =", testsum

#Generate the memory view:
cdef int [:,:] x_view = narr

#Generate the 2D c-array and its pointer:
cdef:
    int32_t[:, :] cython_view = narr
    int32_t *c_integers_array = &cython_view[0, 0]
    int32_t[4] *c_arr = <int32_t[4] *>c_integers_array


def test1():

    speed_test_mview(x_view)  

def test2():

    speed_test_carray(&c_arr[0][0], a, b)


cdef int speed_test_mview(int[:,:] x_view):

    cdef int n, i, j, thesum

    # Define the view:
    for n in range(10000):
        thesum = 0
        for i in range(a):
            for j in range(b):
                thesum += x_view[i, j]        


cdef int speed_test_carray(int32_t *c_Arr, int a, int b):

    cdef int n, i, j, thesum
    for n in range(10000):
        thesum = 0
        for i in range(a):
            for j in range(b):
                thesum += c_Arr[(i*b)+j]

然后使用IPython shell进行时间测试，结果显示速度相似：

import testlib as t
Test summation: np.sum(narr) = 136

%timeit t.test1()
10000000 loops, best of 3: 46.3 ns per loop

%timeit t.test2()
10000000 loops, best of 3: 46 ns per loop

另外，作为比较，在这个例子中使用numpy数组花费了125毫秒（未显示）。

- phil_thy

1

你正在使用 &c_arr[0][0] 将 c_arr 转换为 1D 数组，但其实你可以直接使用 c_integers_array，它们是完全一样的。c_arr 的优点在于你可以将其传递给接受 int[4] *array 或甚至 int array[4][4] 的函数。 - Veedrac

1

此外，如果您进行适当的计时以防止非边界检查版本完全被优化掉，则会注意到两个问题。第一个是访问全局变量a和b比执行cdef int a, b; a = x_view.shape[0]; b = x_view.shape[1]并循环访问它们要慢得多。第二个是边界检查大约有60％的开销（例如从1秒→ 1.6秒）。 - Veedrac

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Veedrac · Accepted Answer

你只需要停止进行边界检查：

with cython.boundscheck(False):
    thesum += x_view[i,j]

这基本上可以让速度达到同等水平。

如果你真的想从中获得C数组，请尝试：

import numpy as numpy
from numpy import int32
from numpy cimport int32_t

numpy_array = numpy.array([[]], dtype=int32)

cdef:
    int32_t[:, :] cython_view = numpy_array
    int32_t *c_integers_array = &cython_view[0, 0]
    int32_t[4] *c_2d_array = <int32_t[4] *>c_integers_array

首先，您需要获取一个Numpy数组。您可以使用该数组来获取一个内存视图。然后，您获取其数据的指针，并将其转换为所需步幅的指针。