在Python 3.x中使用for循环迭代字节对象

4
这段代码适用于Python 2.7,在3.5中会失败。我想将其转换为3.5版本。使用for循环语句时,数据类型发生了变化,我卡在这里了。我是一名有经验的程序员,但对Python相对较新,因此可能很明显,我的Google-foo未能找到确切的示例或解决方案。所以我们开始吧:
以下是这个代码的片段,在2.7中可以正常工作: http://trac.nccoos.org/dataproc/browser/DPWP/trunk/DPWP/ADCP_splitter/pd0.py pd0.py打开二进制输入流,查找记录类型标识字节并将数据分成包含适当数据的两个独立文件,所有数据都是二进制的。
在下面的代码块中,header、length和ensemble都是字节对象。在Python 3.5中,当for循环迭代时会发生一些事情,它生成int,这导致struct.unpack失败。你可以看到在注释中我尝试了强制转换、引用等方法,但都没有成功。我希望能详细了解这里发生了什么,以便我能正确地编写更多的3.5二进制操作。
失败的是value = struct.unpack('B', byte)[0] 我寻找解决方案的地方:
- 阅读有关如何定义bytes(您可以迭代,但我不知道如何实现) - 大量讨论str->bytes和反之亦然,但都无法解决此问题 - 阅读有关unpack的工作原理(unpack不喜欢解包int,显然) - 从2.7转换到3x python - 在stackoverflow上
提前感谢您的帮助。 以下是代码:
def __computeChecksum(header, length, ensemble):
    """Compute a checksum from header, length, and ensemble"""
    # these print as a byte (b'\x7f\x7f' or b'\x7fy') at this point
    print(header)  # header is a bytes object
    cs = 0   
    # so, when the first byte of header is assigned to byte, it gets cast to int.  Why, and how to prevent this?
    for byte in header:
        print(byte) # this prints as an integer at this point, 127 = 0x7F because a bytes object is a "mutable sequence of integers"
        print(type(byte)) # here byte is an int - we need it to be a bytes object for unpack to work
        value = struct.unpack('B', byte)[0]  # this is the line that gets TypeError: a bytes-like object is required, not 'int'
        # this does not work either - from examples online I thought that referencing the first in the array was the problem
        #value = struct.unpack('B', byte)  # this is the line that gets TypeError: a bytes-like object is required, not 'int'
        # this does not work, the error is unpack requires a bytes object of lenth 1, so the casting happened
        #value = struct.unpack('B', bytes(byte))[0] 
        # and this got the error a bytes-like object is required, not 'int', so the [0] reference generates an int
        # value = struct.unpack('B', bytes(byte)[0])[0] 
        cs += value
    for byte in length:
        value = struct.unpack('B', byte)[0]
        cs += value
    for byte in ensemble:
        value = struct.unpack('B', byte)[0]
        cs += value
    return cs & 0xffff

# convenience function reused for header, length, and checksum
def __nextLittleEndianUnsignedShort(file):
    """Get next little endian unsigned short from file"""
    raw = file.read(2)
    """for python 3.5, struct.unpack('<H', raw)[0] needs to return a
       byte, not an int
       Note that it's not a problem here, but in the next cell, when a for loop is involved, we get an error
    """
    return (raw, struct.unpack('<H', raw)[0])

调用上述函数的主程序中的代码

while (header == wavesId) or (header == currentsId):
    print('recnum= ',recnum)
    # get ensemble length
    rawLength, length = __nextLittleEndianUnsignedShort(rawFile)
    # read up to the checksum
    rawEnsemble = rawFile.read(length-4)
    # get checksum
    rawChecksum, checksum = __nextLittleEndianUnsignedShort(rawFile)

    computedChecksum = __computeChecksum(rawHeader, rawLength, rawEnsemble)

    if checksum != computedChecksum:
        raise IOError('Checksum error')

最后,错误的完整文本。
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-5e60bd9b9a54> in <module>()
     13     rawChecksum, checksum = __nextLittleEndianUnsignedShort(rawFile)
     14 
---> 15     computedChecksum = __computeChecksum(rawHeader, rawLength, rawEnsemble)
     16 
     17     if checksum != computedChecksum:

<ipython-input-3-414811fc52e4> in __computeChecksum(header, length, ensemble)
     16        print(byte) # this prints as an integer at this point, 127 = 0x7F because a bytes object is a "mutable sequence of integers"
     17        print(type(byte)) # here byte is an int - weneed it to be a bytes object for unpack to work
---> 18        value = struct.unpack('B', byte)[0]  # this is the line that gets TypeError: a bytes-like object is required, not 'int'
     19        # this does not work either - from examples online I thought that referencing the first in the array was the problem
     20        #value = struct.unpack('B', byte)  # this is the line that gets TypeError: a bytes-like object is required, not 'int'

TypeError: a bytes-like object is required, not 'int'

完整的Python笔记本在这里: https://gist.github.com/mmartini-usgs/4795da39adc9905f70fd8c27a1bba3da

可以通过注释掉 value = struct.unpack('B', byte)[0]cs += value 并将它们替换为 cs += byte 来使此代码工作,但是,是否有更直接的解决方法 - 用于想要按字节迭代的情况? - Marinna Martini
2个回答

2
最优雅的解决方案是简单地:
ensemble = infile.read(ensemblelength)

def __computeChecksum(ensemble):
    cs = 0    
    for byte in range(len(ensemble)-2):
        cs += ensemble[byte]
    return cs & 0xffff

0

不知道header是什么以及数据如何读取,很难回答这个问题。理论上,如果使用rb(读取二进制文件)进行读取,就不应该出现这种情况。(实际上在评论中已经提到了这一点。)

以下是更好的问题解释。

在Python3中迭代单个字节

我会使用if语句来获取int,但你也可以像那个答案中那样重新转换为bytes。此外,请查看numpy.fromfile。我认为它更容易使用。

附:这是一个非常详细的帖子!如果您遵循SSCCE,您可能会得到更有意义的答案。而且您可以像您所做的那样发布完整笔记本的链接;-)

我会只用您的评论重新编写您的问题:

在Python 3.x中迭代字节时,我得到的是int而不是bytes。是否可能获取所有的bytes?

In [0]: [byte for byte in b'\x7f\x7f']
Out[0]: [127, 127]

非常感谢。我还在学习所有这些,包括stackoverflow协议。numpy.fromfile是一个很好的线索。所以,从哲学角度思考Python,我想让我的代码模块化。例如,将一种二进制格式转换为另一种格式是这段代码要做的事情,然后数值操作作为单独的步骤,单独的代码。那么,我不想引入numpy,而是坚持使用最基本的Python?Python变化太快了 - 我想只引入最少量的模块来完成我想要做的事情,这样我就不会像现在的MATLAB代码一样混乱。 - Marinna Martini

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接