在struct.unpack格式字符串中间切换字节序

Question

在struct.unpack格式字符串中间切换字节序

6

我有一堆二进制数据（恰好是一个视频游戏存档文件的内容），其中一部分数据包含小端和大端整数值。由于没有仔细阅读文档，我尝试使用以下方式对其进行解包...

struct.unpack(
    '3sB<H<H<H<H4s<I<I32s>IbBbBbBbB12s20sBB4s',
    string_data
)

当然，我得到了这个神秘的错误信息：

struct.error: bad char in struct format

问题在于，struct.unpack格式字符串并不希望用字节序标记来标识单个字段。实际上，这里正确的格式字符串应该是这样的：

struct.unpack(
    '<3sBHHHH4sII32sIbBbBbBbB12s20sBB4s',
    string_data
)

除了这会翻转第三个字段的字节序（将其解析为小端序，而我实际上想将其解析为大端序）之外，没有其他问题。
有没有简单和/或“Pythonic”的解决方案来解决我的问题？我已经想到了三种可能的解决方案，但它们都不是特别优雅。如果没有更好的想法，我可能会选择第3个：

I could extract a substring and parse it separately:

(my.f1, my.f2, ...) = struct.unpack('<3sBHHHH4sII32sIbBbBbBbB12s20sBB4s', string_data) my.f11 = struct.unpack('>I', string_data[56:60])

I could flip the bits in the field after the fact:

(my.f1, my.f2, ...) = struct.unpack('<3sBHHHH4sII32sIbBbBbBbB12s20sBB4s', string_data) my.f11 = swap32(my.f11)

I could just change my downstream code to expect this field to be represented differently — it's actually a bitmask, not an arithmetic integer, so it wouldn't be too hard to flip around all the bitmasks I'm using with it; but the big-endian versions of these bitmasks are more mnemonically relevant than the little-endian versions.

- Quuxplusone

我认为这里在概念上存在问题。不应该混合大小端。修复将影响您需要解包的字符串的源。关于下游代码选项。它处理一个已转换的int，自动使用运行它的机器的字节序。 - CristiFati

1

@CristiFati：我正在解包的字符串来自于一个存档文件格式。我无法控制它的编码细节，也无法更改它们。我所能做的就是尝试处理我所获得的编码，而我所获得的编码确实以这种方式混合了字节序。 - Quuxplusone

作为一个更广泛的例子，ISO 9660文件系统在某些地方将整数编码为小端和大端。通常这是为了让你在你的架构上选择更容易处理的格式，但如果检查数据的完整性，解码两者并检查它们是否相等可能会很有用。 - penguin359

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Claas · Answer 1

有点晚了，但我刚遇到同样的问题。我用自定义的numpy dtype解决了它，这允许混合具有不同字节顺序的元素（请参见https://numpy.org/doc/stable/reference/generated/numpy.dtype.html）：

t=np.dtype('>u4,<u4') # Compound type with two 4-byte unsigned int with different byte order
a=np.zeros(shape=1, dtype=t) # Create an array of length one with above type
a[0][0]=1 # Assign first uint
a[0][1]=1 # Assign second uint
bytes=a.tobytes() # bytes should be b'\x01\x00\x00\x00\x00\x00\x00\x01'
b=np.frombuffer(buf, dtype=t) # should yield array[(1,1)]
c=np.frombuffer(buf, dtype=np.uint32) # yields array([       1, 16777216]