Python中的字节操作（XOR）

Question

Python中的字节操作（XOR）

pythonencryptionbytetype-conversionoperation

25

#!/usr/bin/env python3

import binascii


var=binascii.a2b_qp("hello")
key=binascii.a2b_qp("supersecretkey")[:len(var)]

print(binascii.b2a_qp(var))
print(binascii.b2a_qp(key))


# here I want to do an XOR operation on the bytes in var and key and place them in 'encryption': encryption=var XOR key

print(binascii.b2a_qp(encrypted))

如果有人能够为我解惑，告诉我如何完成这个任务，我会非常高兴。对于数据类型转换还很陌生，所以读Python维基百科并不如我希望的那样清晰明了。

- Jcov

你的意思是对变量字符串和密钥字符串进行异或操作吗？请注意它们的长度不同。在Python中，异或运算符是^。 - Pynchia

那么，我使用[:len(var)]来将密钥剪切为与var字符串相同的大小是行不通的吗？我认为每个字符都会转换为一个单独的字节，例如a=97=01100001。当我使用encrypted = var ^ key时，我得到了“TypeError：unsupported operand type(s) for ^: 'bytes' and 'bytes'”错误。 - Jcov

3个回答

25

看起来你需要做的是对消息中的每个字符与密钥中相应的字符进行异或运算。但是，为了做到这一点，你需要使用 ord 和 chr 进行一些转换，因为你只能对数字进行异或运算，而不能对字符串进行操作:

>>> encrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(var, key) ] 
>>> encrypted
['\x1b', '\x10', '\x1c', '\t', '\x1d']

>>> decrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(encrypted, key) ]
>>> decrypted
['h', 'e', 'l', 'l', 'o']

>>> "".join(decrypted)
'hello'

请注意，binascii.a2b_qp("hello") 只是将一个字符串转换为另一个字符串（尽管可能使用不同的编码）。

你的方法和我上面的代码只有在密钥长度至少与消息长度相同的情况下才能工作。然而，如果需要，你可以轻松地重复密钥使用 itertools.cycle：

>>> from itertools import cycle
>>> var="hello"
>>> key="xy"

>>> encrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(var, cycle(key)) ]
>>> encrypted
['\x10', '\x1c', '\x14', '\x15', '\x17']

>>> decrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(encrypted, cycle(key)) ]
>>> "".join(decrypted)
'hello'

为了解决Unicode /多字节字符的问题（在下面的评论中提出），可以将字符串（和密钥）转换为字节，将它们压缩在一起，然后执行XOR操作，类似于以下方式：

>>> var=u"hello\u2764"
>>> var
'hello❤'

>>> encrypted = [ a ^ b for (a,b) in zip(bytes(var, 'utf-8'),cycle(bytes(key, 'utf-8'))) ]
>>> encrypted
[27, 16, 28, 9, 29, 145, 248, 199]

>>> decrypted = [ a ^ b for (a,b) in zip(bytes(encrypted), cycle(bytes(key, 'utf-8'))) ]
>>> decrypted
[104, 101, 108, 108, 111, 226, 157, 164]

>>> bytes(decrypted)
b'hello\xe2\x9d\xa4'

>>> bytes(decrypted).decode()
'hello❤'

- DNA

@DNA - 不错！但是对于Unicode输入会失败...zip将字符放入元组中，然后chr因为Unicode字符超出其范围而混淆。例如，var=u'\u2764'会导致异常...❤ - Hamy

@Hamy 你可以尝试使用unichr()代替chr()来解决这个问题，但我还没有尝试过... - DNA

@DNA - 很好的想法，我认为它会对错误的数据进行异或操作 - 传递给ord的双字节Unicode字符将与一个一字节的ASCII字符进行异或运算，并将低位组合在一起，而目标是将var和key都视为字节流，并逐位进行异或运算。例如，bin(ord(u'\u1000'))是0b1000000000000，因此如果我将其与所有1s的字节进行OR作为流操作，则高位应该为1，但实际上发生了这种情况 - bin(ord('\xFF') | ord(u'\u1000'))是0b1000011111111。 - Hamy

2

在我看来，这只是强调了对于字节操作而言 p2 可能会有多棘手... 我所能想到的唯一快速解决方法就是仔细检查输入是否为 str 而不是 unicode，例如 if not isinstance(var, str) or not isinstance(key, str)。 - Hamy

请注意，OP正在使用Python 3。 - DNA

2

你可以使用Numpy来进行更快的操作。

import numpy as np
def encrypt(var, key):
    a = np.frombuffer(var, dtype = np.uint8)
    b = np.frombuffer(key, dtype = np.uint8)
    return (a^b).tobytes()

- Latze

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Vincent · Accepted Answer

两种Python3解决方案的比较

第一种基于zip：

def encrypt1(var, key):
    return bytes(a ^ b for a, b in zip(var, key))

第二种方法使用 int.from_bytes 和 int.to_bytes:

def encrypt2(var, key, byteorder=sys.byteorder):
    key, var = key[:len(var)], var[:len(key)]
    int_var = int.from_bytes(var, byteorder)
    int_key = int.from_bytes(key, byteorder)
    int_enc = int_var ^ int_key
    return int_enc.to_bytes(len(var), byteorder)

简单测试：

assert encrypt1(b'hello', b'supersecretkey') == b'\x1b\x10\x1c\t\x1d'
assert encrypt2(b'hello', b'supersecretkey') == b'\x1b\x10\x1c\t\x1d'

变量 var 和关键字 key 长度均为 1000 字节的性能测试：

$ python3 -m timeit \
  -s "import test_xor;a=b'abcdefghij'*100;b=b'0123456789'*100" \
  "test_xor.encrypt1(a, b)"
10000 loops, best of 3: 100 usec per loop

$ python3 -m timeit \
  -s "import test_xor;a=b'abcdefghij'*100;b=b'0123456789'*100" \
  "test_xor.encrypt2(a, b)"
100000 loops, best of 3: 5.1 usec per loop

整数方法似乎显著更快。