Cython:如何创建 cdef 类型的数组

14

我想要一个cdef类的Cython数组:

cdef class Child:
    cdef int i

    def do(self):
        self.i += 1

cdef class Mother:
    cdef Child[:] array_of_child

    def __init__(self):
        for i in range(100):
            self.array_of_child[i] = Child()

1
你有没有理由认为我的答案已经过时了?我查看了更新日志,没有发现明显的变化。我认为这不太可能改变,因为有很好的技术原因。 - DavidW
“cython array of a cdef class”是什么意思?它是否对应于C++中的Child a[n](其中n为某个数字)或者Child *?显然,您的代码存在一些问题(应该是self.array_of_child,需要分配和释放array_of_child的内存),因此也许解决这些问题会更清楚地表达您的目标。 - ead
1个回答

15
答案是否定的 - 以有用的方式并不真正可能:类似问题的新闻组帖子 直接拥有一个Child数组(在单个块中分配)是不可能的。部分原因是,如果其他地方获得了对数组中的Child的引用,则必须保持该Child处于活动状态(但不是整个数组),如果它们都分配在同一块内存中,则无法确保这一点。此外,如果需要调整数组的大小(如果这是一个要求),则会使数组内部对象的任何其他引用失效。
因此,您只能拥有指向Child的指针数组。这样的结构很好,但内部看起来几乎与Python列表相同(因此,在Cython中做更复杂的事情实际上没有任何好处...)。
有一些明智的解决方法:
  1. The workaround suggested in the newsgroup post is just to use a python list. You could also use a numpy array with dtype=object. If you need to to access a cdef function in the class you can do a cast first:

    cdef Child c = <Child?>a[0] # omit the ? if you don't want
                                # the overhead of checking the type.
    c.some_cdef_function()
    

    Internally both these options are stored as an C array of PyObject pointers to your Child objects and so are not as inefficient as you probably assume.

  2. A further possibility might be to store your data as a C struct (cdef struct ChildStruct: ....) which can be readily stored as an array. When you need a Python interface to that struct you can either define Child so it contains a copy of ChildStruct (but modifications won't propagate back to your original array), or a pointer to ChildStruct (but you need to be careful with ensuring that the memory is not freed which the Child pointing to it is alive).

  3. You could use a Numpy structured array - this is pretty similar to using an array of C structs except Numpy handles the memory, and provides a Python interface.

  4. The memoryview syntax in your question is valid: cdef Child[:] array_of_child. This can be initialized from a numpy array of dtype object:

    array_of_child = np.array([(Child() for i in range(100)])
    

    In terms of data-structure, this is an array of pointers (i.e. the same as a Python list, but can be multi-dimensional). It avoids the need for <Child> casting. The important thing it doesn't do is any kind of type-checking - if you feed an object that isn't Child into the array then it won't notice (because the underlying dtype is object), but will give nonsense answers or segmentation faults.

    In my view this approach gives you a false sense of security about two things: first that you have made a more efficient data structure (you haven't, it's basically the same as a list); second that you have any kind of type safety. However, it does exist. (If you want to use memoryviews, e.g. for multi-dimensional arrays, it would probably be better to use a memoryview of type object - this is honest about the underlying dtype)


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接