从在C中创建的struct读取Python中的struct。

Question

从在C中创建的struct读取Python中的struct。

18

我在使用Python方面非常新手，对C语言也很生疏，所以提前向大家道歉，如果我的问题听起来很愚蠢或者迷茫。

我有一个在C语言中创建包含数据的.dat文件的函数。我正在使用Python打开该文件进行读取。其中一项需要读取的内容是在C函数中创建并以二进制形式打印出来的结构体。在我的Python代码中，我已经定位到了文件中适当的行来读取结构体。我尝试过逐个解包结构体的项目，也尝试过整体解包，但都没有成功。结构体中的大部分项目在C代码中被声明为'real'类型。我与另外一个人共同开发这段代码，主要源代码是他的，并且他将变量声明为'real'类型。我需要将这部分代码放入一个循环中，因为我想读取目录中所有以'.dat'结尾的文件。为了开始循环，我已经写好了以下代码：

for files in os.listdir(path):
  if files.endswith(".dat"):
    part = open(path + files, "rb")
    for line in part:

然后我阅读了包含结构体的那一行之前的所有行。然后我到达了那一行，并且有：

      part_struct = part.readline()
      r = struct.unpack('<d8', part_struct[0])

我只想读取结构体中存储的第一项内容。我在这里看到了一个示例，但是当我尝试时，出现了一个错误，错误信息如下：

struct.error: repeat count given without format specifier

我会接受任何人能给我的建议。我已经卡在这个问题上几天了，尝试了很多不同的方法。说实话，我觉得我对struct模块不太理解，但我已经尽可能多地阅读了相关资料。

谢谢！

- SchwabTheDeck

5个回答

9

一些C代码：

#include <stdio.h>
typedef struct { double v; int t; char c;} save_type;
int main() {
    save_type s = { 12.1f, 17, 's'};
    FILE *f = fopen("output", "w");
    fwrite(&s, sizeof(save_type), 1, f);
    fwrite(&s, sizeof(save_type), 1, f);
    fwrite(&s, sizeof(save_type), 1, f);
    fclose(f);
    return 0;
}

一些Python代码：

import struct
with open('output', 'rb') as f:
    chunk = f.read(16)
    while chunk != "":
        print len(chunk)
        print struct.unpack('dicccc', chunk)
        chunk = f.read(16)

输出:

(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')

但是还有填充问题。 save_type 的填充大小为16，因此我们会读取3个多余字符并忽略它们。

- perreal

请注意，块的大小取决于本地字长。在这种情况下，它似乎是32位。标准实数数据类型占用4个字节。double占用8个字节，整数占用4个字节，char占用1个字节，总共16个字节。如果您在64位机器上执行此操作，则块的大小为24个字节，相应的“格式字符串”将为“dixxxxcxxxxxxx”或等效的“d i4x c7x”。请参见Python结构文档。 - Alexander

0

格式说明符中的数字表示重复计数，但必须放在字母前面，例如'<8d'。不过，您说您只想读取结构体的一个元素。我猜您只需要'<d'。我猜您是想指定要读取的字节数为8，但您并不需要这样做。d会默认假定读取8个字节。

我还注意到您正在使用readline。这似乎不适用于读取二进制数据。它会读取到下一个回车符/换行符，而这在二进制数据中会随机出现。您想做的是使用read(size)，像这样：

part_struct = part.read(8)
r = struct.unpack('<d', part_struct)

其实，你需要小心，因为read可能会返回比你请求的数据更少。如果是这样，你需要重复执行它。

part_struct = b''
while len(part_struct) < 8:
    data = part.read(8 - len(part_struct))
    if not data: raise IOException("unexpected end of file")
    part_struct += data
r = struct.unpack('<d', part_struct)

- morningstar

0

Numpy可以用于读写二进制数据。您只需要定义一个自定义的np.dtype实例，该实例定义了c-struct的内存布局。

例如，这里是一些C++代码定义结构体（虽然我不是C专家，但应该同样适用于C结构体）：

struct MyStruct {
    uint16_t FieldA;
    uint16_t pad16[3];
    uint32_t FieldB;
    uint32_t pad32[2];
    char     FieldC[4];
    uint64_t FieldD;
    uint64_t FieldE;
};

void write_struct(const std::string& fname, MyStruct h) {
    // This function serializes a MyStruct instance and
    // writes the binary data to disk.
    std::ofstream ofp(fname, std::ios::out | std::ios::binary);
    ofp.write(reinterpret_cast<const char*>(&h), sizeof(h));

}

基于我在 stackoverflow.com/a/5397638 找到的建议，我已经在结构体中包含了一些填充（pad16 和 pad32 字段），以便序列化会以更可预测的方式进行。我认为这属于 C++ 的范畴；在使用普通的 C 结构体时可能不是必需的。

现在，在 Python 中，我们创建一个描述 MyStruct 内存布局的 numpy.dtype 对象：

import numpy as np

my_struct_dtype =  np.dtype([
    ("FieldA"            , np.uint16  ,       ),
    ("pad16"             , np.uint16  , (3,)  ),
    ("FieldB"            , np.uint32          ),
    ("pad32"             , np.uint32  , (2,)  ),
    ("FieldC"            , np.byte    , (4,)  ),
    ("FieldD"            , np.uint64          ),
    ("FieldE"            , np.uint64          ),
])

然后使用numpy的fromfile函数来读取你保存C结构体的二进制文件：

# read data
struct_data = np.fromfile(fpath, dtype=my_struct_dtype, count=1)[0]

FieldA         = struct_data["FieldA"]
FieldB         = struct_data["FieldB"]
FieldC         = struct_data["FieldC"]
FieldD         = struct_data["FieldD"]
FieldE         = struct_data["FieldE"]

if FieldA != expected_value_A:
    raise ValueError("Bad FieldA, got %d" % FieldA)
if FieldB != expected_value_B:
    raise ValueError("Bad FieldB, got %d" % FieldB)
if FieldC.tobytes() != b"expc":
    raise ValueError("Bad FieldC, got %s" % FieldC.tobytes().decode())
...

在上述调用中，count=1参数是为了返回的数组只有一个元素；这意味着“从文件中读取第一个结构实例”。请注意，我使用[0]进行索引以将该元素从数组中取出。

如果您已将许多c-struct的数据附加到同一文件中，则可以使用fromfile(..., count=n)将n个结构实例读入形状为(n,)的numpy数组中。将count=-1设置为np.fromfile和np.frombuffer函数的默认值，表示“读取所有数据”，从而生成形状为(number_of_struct_instances,)的一维数组。

您还可以使用offset关键字参数来控制从文件中开始读取的数据。

最后，以下是一些numpy函数，一旦定义了自定义dtype，它们将非常有用：

以numpy数组的形式读取二进制数据：

np.frombuffer(bytes_data, dtype=...): 将给定的二进制数据（例如python中的bytes实例）解释为给定dtype的numpy数组。您可以定义一个自定义的dtype，描述您的c结构的内存布局。
np.fromfile(filename, dtype=...): 从filename读取二进制数据。应该与np.frombuffer(open(filename, "rb").read(), dtype=...)得到相同的结果。

将numpy数组作为二进制数据写入：

ndarray.tobytes(): 构造一个包含来自给定numpy数组的原始数据的python bytes实例。如果数组的数据具有对应于c-struct的dtype，则来自ndarray.tobytes的字节可以由c/c++反序列化并解释为该c-struct的实例（数组）。
ndarray.tofile(filename): 将数组的二进制数据写入filename。然后可以通过c/c++反序列化此数据。等效于open("filename", "wb").write(a.tobytes())。

- Jasha

0

我最近也遇到了同样的问题，所以我为此编写了一个模块，并将其存储在这里：http://pastebin.com/XJyZMyHX

示例代码：

MY_STRUCT="""typedef struct __attribute__ ((__packed__)){
    uint8_t u8;
    uint16_t u16;
    uint32_t u32;
    uint64_t u64;
    int8_t i8;
    int16_t i16;
    int32_t i32;
    int64_t i64;
    long long int lli;
    float flt;
    double dbl;
    char string[12];
    uint64_t array[5];
} debugInfo;"""

PACKED_STRUCT='\x01\x00\x01\x00\x00\x01\x00\x00\x00\x00\x00\x01\x00\x00\x00\xff\x00\xff\x00\x00\xff\xff\x00\x00\x00\x00\xff\xff\xff\xff*\x00\x00\x00\x00\x00\x00\x00ff\x06@\x14\xaeG\xe1z\x14\x08@testString\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00'

if __name__ == '__main__':
    print "String:"
    print depack_bytearray_to_str(PACKED_STRUCT,MY_STRUCT,"<" )
    print "Bytes in Stuct:"+str(structSize(MY_STRUCT))
    nt=depack_bytearray_to_namedtuple(PACKED_STRUCT,MY_STRUCT,"<" )
    print "Named tuple nt:"
    print nt
    print "nt.string="+nt.string

结果应该是：

String:
u8:1
u16:256
u32:65536
u64:4294967296
i8:-1
i16:-256
i32:-65536
i64:-4294967296
lli:42
flt:2.09999990463
dbl:3.01
string:u'testString\x00\x00'
array:(1, 2, 3, 4, 5)

Bytes in Stuct:102
Named tuple nt:
CStruct(u8=1, u16=256, u32=65536, u64=4294967296L, i8=-1, i16=-256, i32=-65536, i64=-4294967296L, lli=42, flt=2.0999999046325684, dbl=3.01, string="u'testString\\x00\\x00'", array=(1, 2, 3, 4, 5))
nt.string=u'testString\x00\x00'

- topin89

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jfs · Accepted Answer

你可以使用 ctypes.Structure 或者 struct.Struct 来指定文件的格式。如果要从 @perreal答案中的C代码生成的文件中读取结构体：

"""
struct { double v; int t; char c;};
"""
from ctypes import *

class YourStruct(Structure):
    _fields_ = [('v', c_double),
                ('t', c_int),
                ('c', c_char)]

with open('c_structs.bin', 'rb') as file:
    result = []
    x = YourStruct()
    while file.readinto(x) == sizeof(x):
        result.append((x.v, x.t, x.c))

print(result)
# -> [(12.100000381469727, 17, 's'), (12.100000381469727, 17, 's'), ...]

请参阅io.BufferedIOBase.readinto()。它在Python 3中得到支持，但是对于默认文件对象在Python 2.7中没有文档记录。 struct.Struct需要明确指定填充字节（x）：

"""
struct { double v; int t; char c;};
"""
from struct import Struct

x = Struct('dicxxx')
with open('c_structs.bin', 'rb') as file:
    result = []
    while True:
        buf = file.read(x.size)
        if len(buf) != x.size:
            break
        result.append(x.unpack_from(buf))

print(result)

它产生相同的输出。

为了避免不必要的复制，Array.from_buffer(mmap_file) 可以用于从文件中获取结构体数组：

import mmap # Unix, Windows
from contextlib import closing

with open('c_structs.bin', 'rb') as file:
    with closing(mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_COPY)) as mm: 
        result = (YourStruct * 3).from_buffer(mm) # without copying
        print("\n".join(map("{0.v} {0.t} {0.c}".format, result)))