将ctypes类型数组转换为void指针时出现了Numpy错误

7

我想将字符串列表发送到一个C函数:

from ctypes import c_double, c_void_p, Structure, cast, c_char_p, c_size_t, POINTER
import numpy as np


class FFIArray(Structure):
    """
    Convert sequence of structs or types to C-compatible void array

    """

    _fields_ = [("data", c_void_p), ("len", c_size_t)]

    @classmethod
    def from_param(cls, seq):
        """  Allow implicit conversions """
        return seq if isinstance(seq, cls) else cls(seq)

    def __init__(self, seq, data_type):
        array = np.ctypeslib.as_array((data_type * len(seq))(*seq))
        self._buffer = array.data
        self.data = cast(array.ctypes.data_as(POINTER(data_type)), c_void_p)
        self.len = len(array)


class Coordinates(Structure):

    _fields_ = [("lat", c_double), ("lon", c_double)]

    def __str__(self):
        return "Latitude: {}, Longitude: {}".format(self.lat, self.lon)


if __name__ == "__main__":
    tup = Coordinates(0.0, 1.0)
    coords = [tup, tup]
    a = b"foo"
    b = b"bar"
    words = [a, b]

    coord_array = FFIArray(coords, data_type=Coordinates)
    print(coord_array)
    word_array = FFIArray(words, c_char_p)
    print(word_array)

这适用于例如c_double,但当我尝试使用c_char_p时失败,并出现以下错误(在Python 2.7.16和3.7.4以及NumPy 1.16.5、1.17.2上测试):

Traceback (most recent call last):
  File "/Users/sth/dev/test/venv3/lib/python3.7/site-packages/numpy/core/_internal.py", line 600, in _dtype_from_pep3118
    dtype, align = __dtype_from_pep3118(stream, is_subdtype=False)
  File "/Users/sth/dev/test/venv3/lib/python3.7/site-packages/numpy/core/_internal.py", line 677, in __dtype_from_pep3118
    raise ValueError("Unknown PEP 3118 data type specifier %r" % stream.s)
ValueError: Unknown PEP 3118 data type specifier 'z'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "so_example.py", line 42, in <module>
    word_array = FFIArray(words, c_char_p)
  File "so_example.py", line 19, in __init__
    array = np.ctypeslib.as_array((data_type * len(seq))(*seq))
  File "/Users/sth/dev/test/venv3/lib/python3.7/site-packages/numpy/ctypeslib.py", line 523, in as_array
    return array(obj, copy=False)
ValueError: '<z' is not a valid PEP 3118 buffer format string

有更好的方法吗?我也不一定要使用numpy,虽然它对于将数值类型和numpy数组转换为其他地方的_FFIArray很有用。


有趣的是,c_char_p 函数出现了问题。Python 字符串是否以 UTF-8 格式输入?使用 c_wchar_p 时是否会出现相同的错误? - Nathan
@nathan(在Python 2.7.16上)字符串以unicode形式传入。切换到c_wchar_p没有任何效果... - urschrei
那个PEP错误应该已经被修补了,根据Python错误10744,但是在Numpy github上有一个有趣的链条,涉及旧版本Python的ctype错误。 - Nathan
1
嗯,看起来在Python 3.7.4 / Numpy 1.17.2上仍然存在这个bug。 - urschrei
@urschrei,您能否给我们提供一个最小的代码来重现这个错误?我无法重现它。还有您尝试测试的测试字符串。 - Himanshu Bansal
显示剩余7条评论
1个回答

3

以下是 [Python.Docs]: ctypes - 一款用于 Python 的外部函数库 的清单。

我还没有完全搞清楚 NumPy 的错误(迄今为止,我已经查看了 _multiarray_umath (C) 源代码,但我不知道如何调用来自_internal.py的函数)。

同时,这里有一个变体,它不使用 NumPy(在这种情况下不需要使用它,但你提到你在其他部分中使用它,因此这可能只解决了你问题的一部分)。

code03.py:

#!/usr/bin/env python3

import sys
import ctypes
import numpy as np


class FFIArray(ctypes.Structure):
    """
    Convert sequence of structs or types to C-compatible void array
    """

    _fields_ = [
        ("data", ctypes.c_void_p),
        ("len", ctypes.c_size_t)
    ]

    @classmethod
    def from_param(cls, seq, data_type):
        """  Allow implicit conversions """
        return seq if isinstance(seq, cls) else cls(seq, data_type)

    def __init__(self, seq, data_type):
        self.len = len(seq)
        self._data_type = data_type
        self._DataTypeArr = self._data_type * self.len
        self.data = ctypes.cast(self._DataTypeArr(*seq), ctypes.c_void_p)

    def __str__(self):
        ret = super().__str__()  # Python 3
        #ret = super(FFIArray, self).__str__()  # !!! Python 2 !!!
        ret += "\nType: {0:s}\nLength: {1:d}\nElement Type: {2:}\nElements:\n".format(
            self.__class__.__name__, self.len, self._data_type)
        arr_data = self._DataTypeArr.from_address(self.data)
        for idx, item in enumerate(arr_data):
            ret += "  {0:d}: {1:}\n".format(idx, item)
        return ret


class Coordinates(ctypes.Structure):
    _fields_ = [
        ("lat", ctypes.c_double),
        ("lon", ctypes.c_double)
    ]

    def __str__(self):
        return "Latitude: {0:.3f}, Longitude: {1:.3f}".format(self.lat, self.lon)


def main():
    coord_list = [Coordinates(i+ 1, i * 2) for i in range(4)]
    s0 = b"foo"
    s1 = b"bar"
    word_list = [s0, s1]

    coord_array = FFIArray(coord_list, data_type=Coordinates)
    print(coord_array)
    word_array = FFIArray(word_list, ctypes.c_char_p)
    print(word_array)


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    print("NumPy: {0:s}\n".format(np.version.version))
    main()
    print("\nDone.")

注意:

  • 修复了FFIArray.from_param中的错误(缺少arg
  • 从初始化器中使用NumPy相当笨拙:
    1. 从字节值创建一个CTypes数组
    2. 创建一个np数组(使用上一步结果)
    3. 创建一个CTypes指针(使用上一步结果)
  • 对原始代码进行了一些小的重构

输出:

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q058049957]> "e:\Work\Dev\VEnvs\py_064_03.07.03_test0\Scripts\python.exe" code03.py
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32

NumPy: 1.16.2

<__main__.FFIArray object at 0x0000019CFEB63648>
Type: FFIArray
Length: 4
Element Type: <class '__main__.Coordinates'>
Elements:
  0: Latitude: 1.000, Longitude: 0.000
  1: Latitude: 2.000, Longitude: 2.000
  2: Latitude: 3.000, Longitude: 4.000
  3: Latitude: 4.000, Longitude: 6.000

<__main__.FFIArray object at 0x0000019CFEB637C8>
Type: FFIArray
Length: 2
Element Type: <class 'ctypes.c_char_p'>
Elements:
  0: b'foo'
  1: b'bar'


Done.


更新 #0

PEP 3118 定义了一个用于访问(共享)内存的标准。其中一部分是用于在缓冲区内容和相关数据之间进行转换的格式字符串说明符。这些说明符列在 [Python.Docs]: PEP 3118 - Additions to the struct string-syntax 中,并扩展了 [Python 3.Docs]: struct - Format Characters 中的说明符。
ctypes 类型具有一个(!!!未记录!!!_type_ 属性,我认为它在执行从 / 到 np 的转换时使用:

>>> import ctypes
>>>
>>> data_types = list()
>>>
>>> for attr_name in dir(ctypes):
...     attr = getattr(ctypes, attr_name, None)
...     if isinstance(attr, (type,)) and issubclass(attr, (ctypes._SimpleCData,)):
...         data_types.append((attr, attr_name))
...
>>> for data_type, data_type_name in data_types:
...     print("{0:} ({1:}) - {2:}".format(data_type, data_type_name, getattr(data_type, "_type_", None)))
...
<class 'ctypes.HRESULT'> (HRESULT) - l
<class '_ctypes._SimpleCData'> (_SimpleCData) - None
<class 'ctypes.c_bool'> (c_bool) - ?
<class 'ctypes.c_byte'> (c_byte) - b
<class 'ctypes.c_char'> (c_char) - c
<class 'ctypes.c_char_p'> (c_char_p) - z
<class 'ctypes.c_double'> (c_double) - d
<class 'ctypes.c_float'> (c_float) - f
<class 'ctypes.c_long'> (c_int) - l
<class 'ctypes.c_short'> (c_int16) - h
<class 'ctypes.c_long'> (c_int32) - l
<class 'ctypes.c_longlong'> (c_int64) - q
<class 'ctypes.c_byte'> (c_int8) - b
<class 'ctypes.c_long'> (c_long) - l
<class 'ctypes.c_double'> (c_longdouble) - d
<class 'ctypes.c_longlong'> (c_longlong) - q
<class 'ctypes.c_short'> (c_short) - h
<class 'ctypes.c_ulonglong'> (c_size_t) - Q
<class 'ctypes.c_longlong'> (c_ssize_t) - q
<class 'ctypes.c_ubyte'> (c_ubyte) - B
<class 'ctypes.c_ulong'> (c_uint) - L
<class 'ctypes.c_ushort'> (c_uint16) - H
<class 'ctypes.c_ulong'> (c_uint32) - L
<class 'ctypes.c_ulonglong'> (c_uint64) - Q
<class 'ctypes.c_ubyte'> (c_uint8) - B
<class 'ctypes.c_ulong'> (c_ulong) - L
<class 'ctypes.c_ulonglong'> (c_ulonglong) - Q
<class 'ctypes.c_ushort'> (c_ushort) - H
<class 'ctypes.c_void_p'> (c_void_p) - P
<class 'ctypes.c_void_p'> (c_voidp) - P
<class 'ctypes.c_wchar'> (c_wchar) - u
<class 'ctypes.c_wchar_p'> (c_wchar_p) - Z
<class 'ctypes.py_object'> (py_object) - O
如上所述,c_char_pc_whar_p未被找到或不符合标准。乍一看,似乎这是一个ctypes的bug,因为它没有遵守标准,但在进一步调查之前(特别是因为已经在此领域报告了错误:[Python.Bugs]: ctypes arrays have incorrect buffer information (PEP-3118)),我不会急于声称这个事实并提交错误报告。

以下是另一种处理np数组的变体。

code04.py:

#!/usr/bin/env python3

import sys
import ctypes
import numpy as np


class FFIArray(ctypes.Structure):
    """
    Convert sequence of structs or types to C-compatible void array
    """

    _fields_ = [
        ("data", ctypes.c_void_p),
        ("len", ctypes.c_size_t)
    ]

    _special_np_types_mapping = {
        ctypes.c_char_p: "S",
        ctypes.c_wchar_p: "U",
    }

    @classmethod
    def from_param(cls, seq, data_type=ctypes.c_void_p):
        """  Allow implicit conversions """
        return seq if isinstance(seq, cls) else cls(seq, data_type=data_type)

    def __init__(self, seq, data_type=ctypes.c_void_p):
        self.len = len(seq)
        self.__data_type = data_type  # Used just to hold the value passed to the initializer
        if isinstance(seq, np.ndarray):
            arr = np.ctypeslib.as_ctypes(seq)
            self._data_type = arr._type_  # !!! data_type is ignored in this case !!!
            self._DataTypeArr = arr.__class__
            self.data = ctypes.cast(arr, ctypes.c_void_p)
        else:
            self._data_type = data_type
            self._DataTypeArr = self._data_type * self.len
            self.data = ctypes.cast(self._DataTypeArr(*seq), ctypes.c_void_p)

    def __str__(self):
        strings = [super().__str__()]  # Python 3
        #strings = [super(FFIArray, self).__str__()]  # !!! Python 2 (ugly) !!!
        strings.append("Type: {0:s}\nElement Type: {1:}{2:}\nElements ({3:d}):".format(
            self.__class__.__name__, self._data_type,
            "" if self._data_type == self.__data_type else " ({0:})".format(self.__data_type),
            self.len))
        arr_data = self._DataTypeArr.from_address(self.data)
        for idx, item in enumerate(arr_data):
            strings.append("  {0:d}: {1:}".format(idx, item))
        return "\n".join(strings) + "\n"

    def to_np(self):
        arr_data = self._DataTypeArr.from_address(self.data)
        if self._data_type in self._special_np_types_mapping:
            dtype = np.dtype(self._special_np_types_mapping[self._data_type] + str(max(len(item) for item in arr_data)))
            np_arr = np.empty(self.len, dtype=dtype)
            for idx, item in enumerate(arr_data):
                np_arr[idx] = item
            return np_arr
        else:
            return np.ctypeslib.as_array(arr_data)


class Coordinates(ctypes.Structure):
    _fields_ = [
        ("lat", ctypes.c_double),
        ("lon", ctypes.c_double)
    ]

    def __str__(self):
        return "Latitude: {0:.3f}, Longitude: {1:.3f}".format(self.lat, self.lon)


def main():
    coord_list = [Coordinates(i + 1, i * 2) for i in range(4)]
    s0 = b"foo"
    s1 = b"bar (beyond all recognition)"  # To avoid having 2 equal strings
    word_list = [s0, s1]

    coord_array0 = FFIArray(coord_list, data_type=Coordinates)
    print(coord_array0)

    word_array0 = FFIArray(word_list, data_type=ctypes.c_char_p)
    print(word_array0)
    print("to_np: {0:}\n".format(word_array0.to_np()))

    np_array_src = np.array([0, -3.141593, 2.718282, -0.577, 0.618])
    float_array0 = FFIArray.from_param(np_array_src, data_type=None)
    print(float_array0)
    np_array_dst = float_array0.to_np()
    print("to_np: {0:}".format(np_array_dst))
    print("Equal np arrays: {0:}\n".format(all(np_array_src == np_array_dst)))

    empty_array0 = FFIArray.from_param([])
    print(empty_array0)


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    print("NumPy: {0:s}\n".format(np.version.version))
    main()
    print("\nDone.")

输出:

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q058049957]> "e:\Work\Dev\VEnvs\py_064_03.07.03_test0\Scripts\python.exe" code04.py
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32

NumPy: 1.16.2

<__main__.FFIArray object at 0x000002484A2265C8>
Type: FFIArray
Element Type: <class '__main__.Coordinates'>
Elements (4):
  0: Latitude: 1.000, Longitude: 0.000
  1: Latitude: 2.000, Longitude: 2.000
  2: Latitude: 3.000, Longitude: 4.000
  3: Latitude: 4.000, Longitude: 6.000

<__main__.FFIArray object at 0x000002484A2267C8>
Type: FFIArray
Element Type: <class 'ctypes.c_char_p'>
Elements (2):
  0: b'foo'
  1: b'bar (beyond all recognition)'

to_np: [b'foo' b'bar (beyond all recognition)']

<__main__.FFIArray object at 0x000002484A2264C8>
Type: FFIArray
Element Type: <class 'ctypes.c_double'> (None)
Elements (5):
  0: 0.0
  1: -3.141593
  2: 2.718282
  3: -0.577
  4: 0.618

to_np: [ 0.       -3.141593  2.718282 -0.577     0.618   ]
Equal np arrays: True

<__main__.FFIArray object at 0x000002484A226848>
Type: FFIArray
Element Type: <class 'ctypes.c_void_p'>
Elements (0):


Done.

当然,这是其中的一种可能性。另外一种可能涉及(已弃用)[SciPy.Docs]: numpy.char.array 的使用,但我不想过于复杂化事情(没有明确的场景)。



更新#1

添加了将FFIArray转换为np数组(我不是np专家,所以可能对专家来说看起来很麻烦)。字符串需要特殊处理。
没有发布新的代码版本(因为更改并不是非常显著),而是在之前的版本上进行了修改。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接