在Python 3.x中使用for循环迭代字节对象

Question

在Python 3.x中使用for循环迭代字节对象

4

这段代码适用于Python 2.7，在3.5中会失败。我想将其转换为3.5版本。使用for循环语句时，数据类型发生了变化，我卡在这里了。我是一名有经验的程序员，但对Python相对较新，因此可能很明显，我的Google-foo未能找到确切的示例或解决方案。所以我们开始吧：

以下是这个代码的片段，在2.7中可以正常工作： http://trac.nccoos.org/dataproc/browser/DPWP/trunk/DPWP/ADCP_splitter/pd0.py pd0.py打开二进制输入流，查找记录类型标识字节并将数据分成包含适当数据的两个独立文件，所有数据都是二进制的。

在下面的代码块中，header、length和ensemble都是字节对象。在Python 3.5中，当for循环迭代时会发生一些事情，它生成int，这导致struct.unpack失败。你可以看到在注释中我尝试了强制转换、引用等方法，但都没有成功。我希望能详细了解这里发生了什么，以便我能正确地编写更多的3.5二进制操作。

失败的是value = struct.unpack('B', byte)[0] 我寻找解决方案的地方：

- 阅读有关如何定义bytes（您可以迭代，但我不知道如何实现） - 大量讨论str->bytes和反之亦然，但都无法解决此问题 - 阅读有关unpack的工作原理（unpack不喜欢解包int，显然） - 从2.7转换到3x python - 在stackoverflow上

提前感谢您的帮助。以下是代码：

def __computeChecksum(header, length, ensemble):
    """Compute a checksum from header, length, and ensemble"""
    # these print as a byte (b'\x7f\x7f' or b'\x7fy') at this point
    print(header)  # header is a bytes object
    cs = 0   
    # so, when the first byte of header is assigned to byte, it gets cast to int.  Why, and how to prevent this?
    for byte in header:
        print(byte) # this prints as an integer at this point, 127 = 0x7F because a bytes object is a "mutable sequence of integers"
        print(type(byte)) # here byte is an int - we need it to be a bytes object for unpack to work
        value = struct.unpack('B', byte)[0]  # this is the line that gets TypeError: a bytes-like object is required, not 'int'
        # this does not work either - from examples online I thought that referencing the first in the array was the problem
        #value = struct.unpack('B', byte)  # this is the line that gets TypeError: a bytes-like object is required, not 'int'
        # this does not work, the error is unpack requires a bytes object of lenth 1, so the casting happened
        #value = struct.unpack('B', bytes(byte))[0] 
        # and this got the error a bytes-like object is required, not 'int', so the [0] reference generates an int
        # value = struct.unpack('B', bytes(byte)[0])[0] 
        cs += value
    for byte in length:
        value = struct.unpack('B', byte)[0]
        cs += value
    for byte in ensemble:
        value = struct.unpack('B', byte)[0]
        cs += value
    return cs & 0xffff

# convenience function reused for header, length, and checksum
def __nextLittleEndianUnsignedShort(file):
    """Get next little endian unsigned short from file"""
    raw = file.read(2)
    """for python 3.5, struct.unpack('<H', raw)[0] needs to return a
       byte, not an int
       Note that it's not a problem here, but in the next cell, when a for loop is involved, we get an error
    """
    return (raw, struct.unpack('<H', raw)[0])

调用上述函数的主程序中的代码

while (header == wavesId) or (header == currentsId):
    print('recnum= ',recnum)
    # get ensemble length
    rawLength, length = __nextLittleEndianUnsignedShort(rawFile)
    # read up to the checksum
    rawEnsemble = rawFile.read(length-4)
    # get checksum
    rawChecksum, checksum = __nextLittleEndianUnsignedShort(rawFile)

    computedChecksum = __computeChecksum(rawHeader, rawLength, rawEnsemble)

    if checksum != computedChecksum:
        raise IOError('Checksum error')

最后，错误的完整文本。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-5e60bd9b9a54> in <module>()
     13     rawChecksum, checksum = __nextLittleEndianUnsignedShort(rawFile)
     14 
---> 15     computedChecksum = __computeChecksum(rawHeader, rawLength, rawEnsemble)
     16 
     17     if checksum != computedChecksum:

<ipython-input-3-414811fc52e4> in __computeChecksum(header, length, ensemble)
     16        print(byte) # this prints as an integer at this point, 127 = 0x7F because a bytes object is a "mutable sequence of integers"
     17        print(type(byte)) # here byte is an int - weneed it to be a bytes object for unpack to work
---> 18        value = struct.unpack('B', byte)[0]  # this is the line that gets TypeError: a bytes-like object is required, not 'int'
     19        # this does not work either - from examples online I thought that referencing the first in the array was the problem
     20        #value = struct.unpack('B', byte)  # this is the line that gets TypeError: a bytes-like object is required, not 'int'

TypeError: a bytes-like object is required, not 'int'

完整的Python笔记本在这里： https://gist.github.com/mmartini-usgs/4795da39adc9905f70fd8c27a1bba3da

- Marinna Martini

可以通过注释掉 value = struct.unpack('B', byte)[0] 和 cs += value 并将它们替换为 cs += byte 来使此代码工作，但是，是否有更直接的解决方法 - 用于想要按字节迭代的情况？ - Marinna Martini

2个回答

0

不知道header是什么以及数据如何读取，很难回答这个问题。理论上，如果使用rb（读取二进制文件）进行读取，就不应该出现这种情况。（实际上在评论中已经提到了这一点。）

以下是更好的问题解释。

在Python3中迭代单个字节

我会使用if语句来获取int，但你也可以像那个答案中那样重新转换为bytes。此外，请查看numpy.fromfile。我认为它更容易使用。

附：这是一个非常详细的帖子！如果您遵循SSCCE，您可能会得到更有意义的答案。而且您可以像您所做的那样发布完整笔记本的链接;-)

我会只用您的评论重新编写您的问题：

在Python 3.x中迭代字节时，我得到的是int而不是bytes。是否可能获取所有的bytes？

In [0]: [byte for byte in b'\x7f\x7f']
Out[0]: [127, 127]

- ocefpaf

非常感谢。我还在学习所有这些，包括stackoverflow协议。numpy.fromfile是一个很好的线索。所以，从哲学角度思考Python，我想让我的代码模块化。例如，将一种二进制格式转换为另一种格式是这段代码要做的事情，然后数值操作作为单独的步骤，单独的代码。那么，我不想引入numpy，而是坚持使用最基本的Python？Python变化太快了 - 我想只引入最少量的模块来完成我想要做的事情，这样我就不会像现在的MATLAB代码一样混乱。 - Marinna Martini

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Marinna Martini · Accepted Answer

最优雅的解决方案是简单地：

ensemble = infile.read(ensemblelength)

def __computeChecksum(ensemble):
    cs = 0    
    for byte in range(len(ensemble)-2):
        cs += ensemble[byte]
    return cs & 0xffff