将Python的长整型/整型转换为固定大小的字节数组

Question

将Python的长整型/整型转换为固定大小的字节数组

pythonarrayslong-integerdiffie-hellmanrc4-cipher

62

我正在尝试在Python中实现RC4和DH密钥交换。问题是，我不知道如何将密钥交换中的Python长整数转换为RC4实现所需的字节数组。是否有一种简单的方法将长整数转换为所需长度的字节数组？

更新：忘记提到我正在处理的数字是768位无符号整数。

- cdecker

不确定是否有帮助，但请查看struct模块：http://docs.python.org/library/struct.html - Samus_

10个回答

31

每个人都过于复杂化了这个答案：

some_int = <256 bit integer>
some_bytes = some_int.to_bytes(32, sys.byteorder)
my_bytearray = bytearray(some_bytes)

您只需要知道要转换的字节数。在我的应用场景中，通常仅在加密时使用这么大的数字，此时我必须担心模数等问题，因此我认为需要知道返回的最大字节数不是一个大问题。

由于您正在进行768位数学计算，因此参数不是32而是96。

- sparticvs

1

在Python 3中，这个解决方案对于2048位整数非常有效。在Python 2.7中，它仅适用于int（在Python 2.7中，2048位整数是long类型）。 - desowin

5

在 Python 2.7 中，执行 some_int.to_bytes(32, sys.byteorder) 会产生错误信息 AttributeError: 'int' object has no attribute 'to_bytes'。 - oHo

1

并不是每个人都知道...请查看@JackOConnor的答案。 - Anonymous

21

我没有进行任何基准测试，但这个方法“对我有效”。

简短的版本：使用'％x'％val，然后解码结果。unhexlify有细节问题，需要一组偶数个十六进制数字，而%x不能保证。请查看docstring和自由的行内注释以获取详细信息。

from binascii import unhexlify

def long_to_bytes (val, endianness='big'):
    """
    Use :ref:`string formatting` and :func:`~binascii.unhexlify` to
    convert ``val``, a :func:`long`, to a byte :func:`str`.

    :param long val: The value to pack

    :param str endianness: The endianness of the result. ``'big'`` for
      big-endian, ``'little'`` for little-endian.

    If you want byte- and word-ordering to differ, you're on your own.

    Using :ref:`string formatting` lets us use Python's C innards.
    """

    # one (1) hex digit per four (4) bits
    width = val.bit_length()

    # unhexlify wants an even multiple of eight (8) bits, but we don't
    # want more digits than we need (hence the ternary-ish 'or')
    width += 8 - ((width % 8) or 8)

    # format width specifier: four (4) bits per hex digit
    fmt = '%%0%dx' % (width // 4)

    # prepend zero (0) to the width, to zero-pad the output
    s = unhexlify(fmt % val)

    if endianness == 'little':
        # see https://dev59.com/XXNA5IYBdhLWcg3wgeOk#931095
        s = s[::-1]

    return s

...还有我的nosetest单元测试 ;-)

class TestHelpers (object):
    def test_long_to_bytes_big_endian_small_even (self):
        s = long_to_bytes(0x42)
        assert s == '\x42'

        s = long_to_bytes(0xFF)
        assert s == '\xff'

    def test_long_to_bytes_big_endian_small_odd (self):
        s = long_to_bytes(0x1FF)
        assert s == '\x01\xff'

        s = long_to_bytes(0x201FF)
        assert s == '\x02\x01\xff'

    def test_long_to_bytes_big_endian_large_even (self):
        s = long_to_bytes(0xab23456c8901234567)
        assert s == '\xab\x23\x45\x6c\x89\x01\x23\x45\x67'

    def test_long_to_bytes_big_endian_large_odd (self):
        s = long_to_bytes(0x12345678901234567)
        assert s == '\x01\x23\x45\x67\x89\x01\x23\x45\x67'

    def test_long_to_bytes_little_endian_small_even (self):
        s = long_to_bytes(0x42, 'little')
        assert s == '\x42'

        s = long_to_bytes(0xFF, 'little')
        assert s == '\xff'

    def test_long_to_bytes_little_endian_small_odd (self):
        s = long_to_bytes(0x1FF, 'little')
        assert s == '\xff\x01'

        s = long_to_bytes(0x201FF, 'little')
        assert s == '\xff\x01\x02'

    def test_long_to_bytes_little_endian_large_even (self):
        s = long_to_bytes(0xab23456c8901234567, 'little')
        assert s == '\x67\x45\x23\x01\x89\x6c\x45\x23\xab'

    def test_long_to_bytes_little_endian_large_odd (self):
        s = long_to_bytes(0x12345678901234567, 'little')
        assert s == '\x67\x45\x23\x01\x89\x67\x45\x23\x01'

- Tripp Lilley

1

当值为0时（Python 3.5），我遇到了问题 binascii.Error: Odd-length string，快速解决方法是：将 s = unhexlify(fmt % val) 替换为 s = unhexlify('00') if fmt % val == '0' else unhexlify(fmt % val)。 - Kevin

这更加简洁。 - YoungCoder5

13

一句话概括：

bytearray.fromhex('{:0192x}'.format(big_int))

192是768/4，因为OP想要768位的数字，而16进制数中有4个位在一个hex digit中。如果您需要更大的bytearray，请使用带有更高数字的格式字符串。例如：

>>> big_int = 911085911092802609795174074963333909087482261102921406113936886764014693975052768158290106460018649707059449553895568111944093294751504971131180816868149233377773327312327573120920667381269572962606994373889233844814776702037586419
>>> bytearray.fromhex('{:0192x}'.format(big_int))
bytearray(b'\x96;h^\xdbJ\x8f3obL\x9c\xc2\xb0-\x9e\xa4Sj-\xf6i\xc1\x9e\x97\x94\x85M\x1d\x93\x10\\\x81\xc2\x89\xcd\xe0a\xc0D\x81v\xdf\xed\xa9\xc1\x83p\xdbU\xf1\xd0\xfeR)\xce\x07\xdepM\x88\xcc\x7fv\\\x1c\x8di\x87N\x00\x8d\xa8\xbd[<\xdf\xaf\x13z:H\xed\xc2)\xa4\x1e\x0f\xa7\x92\xa7\xc6\x16\x86\xf1\xf3')
>>> lepi_int = 0x963b685edb4a8f336f624c9cc2b02d9ea4536a2df669c19e9794854d1d93105c81c289cde061c0448176dfeda9c18370db55f1d0fe5229ce07de704d88cc7f765c1c8d69874e008da8bd5b3cdfaf137a3a48edc229a41e0fa792a7c61686f1f
>>> bytearray.fromhex('{:0192x}'.format(lepi_int))
bytearray(b'\tc\xb6\x85\xed\xb4\xa8\xf36\xf6$\xc9\xcc+\x02\xd9\xeaE6\xa2\xdff\x9c\x19\xe9yHT\xd1\xd91\x05\xc8\x1c(\x9c\xde\x06\x1c\x04H\x17m\xfe\xda\x9c\x187\r\xb5_\x1d\x0f\xe5"\x9c\xe0}\xe7\x04\xd8\x8c\xc7\xf7e\xc1\xc8\xd6\x98t\xe0\x08\xda\x8b\xd5\xb3\xcd\xfa\xf17\xa3\xa4\x8e\xdc"\x9aA\xe0\xfay*|aho\x1f')

[我之前的答案使用了hex()函数。为了处理长度为奇数个字节的整数，我改用format()函数进行了修正。这样可以解决之前出现的ValueError错误。]

- Jess Austin

@MarioAlemi，您评论中的代码是错误的。strip('0x')也会去掉尾随的零，这将导致错误的结果（有时还会出现ValueError）！ - Lepi

2

@Jess Austin：你的解决方案完全错误，因为它仅在x由偶数个十六进制数字组成时才有效。例如：

x=0x963b685edb4a8f336f624c9cc2b02d9ea4536a2df669c19e9794854d1d93105c81c289cde061c0448176dfeda9c18370db55f1d0fe5229ce07de704d88cc7f765c1c8d69874e008da8bd5b3cdfaf137a3a48edc229a41e0fa792a7c61686f1fL

- Lepi

@lepi，你能举个例子吗？ - Mario Alemi

@MarioAlemi bytearray.fromhex(hex(0x11000000).strip('0x').strip('L')) 它不仅会从开头删除'0x'字符序列，还将从两侧删除所有的'0'和所有的'x'字符。当数字不是长整型并且有尾随零时，这些零也将被删除。 - Lepi

1

没错，我只是想指出，如果你将格式字符串中的192改为191（或任何奇数），你会得到一个ValueError。这是让我困扰的一点。 - Justin

显示剩余4条评论

8

将long/int转换为字节数组的确切目的类似于struct.pack。对于超过4（8）个字节的长整型，您可以尝试以下操作：

>>> limit = 256*256*256*256 - 1
>>> i = 1234567890987654321
>>> parts = []
>>> while i:
        parts.append(i & limit)
        i >>= 32

>>> struct.pack('>' + 'L'*len(parts), *parts )
'\xb1l\x1c\xb1\x11"\x10\xf4'

>>> struct.unpack('>LL', '\xb1l\x1c\xb1\x11"\x10\xf4')
(2976652465L, 287445236)
>>> (287445236L << 32) + 2976652465L
1234567890987654321L

- Roman Bodnarchuk

4

但这并不能解决大数字（>8字节）的问题，这些数字通常用于加密应用。 - interjay

它的编写不是为了通用性，而更像是解决表示所有可能的IP或类似问题的固定大小解决方案... - bigkahunaburger

7

你可以尝试使用 struct：

import struct
struct.pack('L',longvalue)

- Eduardo Ivanec

1

很遗憾，错误：'L'格式代码的整数超出范围。它是768位长的，比4字节无符号整数大得多。 - cdecker

1

因为Python的long int是任意长的整数，所以被downvote了。可以将其看作一个由32个（或其他数字）位整数组成的数组。而C语言中的long则是一个有固定大小的数据类型。通过这种回应，你同时混淆了两者。 - Havok

7

小端序，如果需要大端序则反转结果或者范围：

def int_to_bytes(val, num_bytes):
    return [(val & (0xff << pos*8)) >> pos*8 for pos in range(num_bytes)]

大端字节序：

def int_to_bytes(val, num_bytes):
    return [(val & (0xff << pos*8)) >> pos*8 for pos in reversed(range(num_bytes))]

- scornwell

3

基本上你需要做的是将int/long转换为它的256进制表示——即数字的“位数”范围从0到255。以下是一种相当有效的方法：

def base256_encode(n, minwidth=0): # int/long to byte array
    if n > 0:
        arr = []
        while n:
            n, rem = divmod(n, 256)
            arr.append(rem)
        b = bytearray(reversed(arr))
    elif n == 0:
        b = bytearray(b'\x00')
    else:
        raise ValueError

    if minwidth > 0 and len(b) < minwidth: # zero padding needed?
        b = (minwidth-len(b)) * '\x00' + b
    return b

您可能不需要调用reversed()，具体取决于所需的字节顺序（这样做也需要以不同的方式进行填充）。还要注意，按照现有的方式，它无法处理负数。

您还可以查看number.py模块中高度优化的类似函数long_to_bytes()，它是开源的Python加密工具包的一部分。它实际上将数字转换为字符串，而不是字节数组，但这只是一个小问题。

- martineau

2

Python 2.7没有实现int.to-very slow_bytes()方法。

我尝试了3种方法：

十六进制解包/打包：非常慢
每次8位字节移位：明显更快。
使用“C”模块并打包到较低的（7 ia64或3 i32）字节中。这比2/快两倍。它是最快的选项，但仍然太慢。

所有这些方法都非常低效，原因有两个：

Python 2.7不支持此有用操作。
c不支持使用大多数平台上可用的进位/借位/溢出标志进行扩展精度算术。

- GP Eckersley

0

i = 0x12345678
s = struct.pack('<I',i)
b = struct.unpack('BBBB',s)

- Artem Romanov

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jack O'Connor · Accepted Answer

使用Python 3.2及以上版本，您可以使用int.to_bytes和int.from_bytes: https://docs.python.org/3/library/stdtypes.html#int.to_bytes