Python 读取和写入二进制文件

Question

Python 读取和写入二进制文件

5

以下是我重新表述的问题：

读取二进制文件的前10个字节（后续操作）-

infile = open('infile.jpg', 'rb')
outfile = open('outfile.jpg', 'wb')
x = infile.read(10)
for i in x:
    print(i, end=', ')
print(x)
outfile.write(bytes(x, "UTF-8"))

第一个打印语句显示 -

255, 216, 255, 224, 0, 16, 74, 70, 73, 70,

第二个print语句输出的是 -

b'\xff\xd8\xff\xe0\x00\x10JFIF'

将x中的值转换为十六进制。

outfile.write(bytes(x, "UTF-8"))

返回值 -

TypeError: encoding or errors without a string argument

那么x必须不是普通的字符串，而是一个字节字符串，但仍然是可迭代的吗？

如果我想将x的内容原封不动地写入outfile.jpg，则可以执行以下操作 -

outfile.write(x)

现在我尝试对每个 x[i] 进行某些操作（如下所示，是简单的乘积1），将值赋给 y 并将 y 写入 outfile.jpg，使其与 infile.jpg 相同。因此，我尝试 -

infile = open('infile.jpg', 'rb')
outfile = open('outfile.jpg', 'wb')
x = infile.read(10)

yi = len(x)
y = [0 for i in range(yi)]

j = 0
for i in x:
    y [j] = i*1
    j += 1

for i in x:
    print(i, end=', ')

print(x)

for i in y:
    print(i, end=', ')

print(y)

print(repr(x))
print(repr(y))

outfile.write(y)

第一个打印语句（通过x进行迭代）输出 -

255, 216, 255, 224, 0, 16, 74, 70, 73, 70,

第二个打印语句输出的是 -

b'\xff\xd8\xff\xe0\x00\x10JFIF'

第三个打印语句（遍历y）输出结果为：

255, 216, 255, 224, 0, 16, 74, 70, 73, 70,

print语句的作用是 -

[255, 216, 255, 224, 0, 16, 74, 70, 73, 70]

最后，按照Tim的建议，打印repr(x)和repr(y)分别得到以下结果 -

b'\xff\xd8\xff\xe0\x00\x10JFIF'
[255, 216, 255, 224, 0, 16, 74, 70, 73, 70]

文件写入语句出现错误 -

TypeError: 'list' does not support the buffer interface

我需要的是y与x具有相同的类型，使得outfile.write(x) = outfile.write(y)

我凝视着Python的眼睛，但仍然看不到它的灵魂。

- brett

看一下这篇帖子：https://dev59.com/5m035IYBdhLWcg3wW-pa 看起来在Python 2和Python 3之间，String类发生了变化。 - Hunter McMillen

猎人 - 我用outfile.write(s.encode('UTF-8')替换了outfile.write(s)，并没有收到任何错误！但是使用infile.read()导致outfile.jpg的大小是infile.jpg的两倍，并且损坏了。我的目标是读取二进制文件，执行一个操作，然后将该操作反转并将输出写入单独的文件中，使它们完全相同。 - brett

我链接的帖子中给出的答案使用了 outfile.write(bytes(s, "UTF-8"))。 - Hunter McMillen

3个回答

1

... 这就是在 Python 中以二进制模式读写文件的方法。

#open binary files infile and outfile
infile = open('infile.jpg', 'rb')
outfile = open('outfile.jpg', 'wb')

#n = bytes to read
n=5

#read bytes of infile to x
x = infile.read(n)

#print x type, x
print()
print('x = ', repr(x), type(x))
print()

x = b'\xff\xd8\xff\xe0\x00' 类型为 'bytes'

#define y of type list, lenth xi, type list
xi = len(x)
y = [0 for i in range(xi)]

#print y type, y
print('y =', repr(y), type(y))
print()

y = [0, 0, 0, 0, 0] 类型为 'list'

#convert x to 8 bit octals and place in y, type list
j=0
for i in x:
    y [j] = '{:08b}' .format(ord(i))
    j += 1

#print y type, and y
print('y =', repr(y), type(y))
print()

y = ['11111111', '11011000', '11111111', '11100000', '00000000'] 类型为 'list'

#perform bit level operations on y [i], not done in this example.

#convert y [i] back to integer
j=0
for i in y:
    y [j] = int(i, 2)
    j += 1

#print y type, and y
print('y =', repr(y), type(y))
print()

y = [255, 216, 255, 224, 0] 类型为 'list'

#convert y to type byte and place in z
z = bytearray(y)

#print z type, and z
print('z =', repr(z), type(z))
print()

z = bytearray(b'\xff\xd8\xff\xe0\x00') class 'bytearray'

z = bytearray(b'\xff\xd8\xff\xe0\x00') 类型为 'bytearray'

#output z to outfile
outfile.write(z)

infile.close()
outfile.close()
outfile = open('outfile.jpg', 'rb')

#read bytes of outfile to x
x = outfile.read(n)

#print x type, and x
print('x =', repr(x), type(x))
print()

x = b'\xff\xd8\xff\xe0\x00' class 'bytes'

x = b'\xff\xd8\xff\xe0\x00' 类型为 'bytes'

#conclusion:  first n bytes of infile = n bytes of outfile (without bit level operations)

outfile.close()

- brett

0

谢谢澄清！你想要的很容易实现，但你确实需要阅读有关bytes和bytearray类型的文档。你不想要任何与以下内容有关的东西：

Unicode
字符串
编码
解码

这些在这里都是完全无关紧要的。你有二进制数据并且需要使用bytes和/或bytearray对象。两者都是字节序列（在range(256)中为“小整数”）；bytes是一个不可变序列，而bytearray是一个可变序列。

那么x肯定不是普通字符串，而是仍然可迭代的字节字符串吗？

看看文档吧 ;-) x不是“字符串”，执行以下操作即可查看其类型：

print(type(x))

那将显示：

<class 'bytes'>

这是一个 bytes 对象，如前所述。它是一个序列，因此像所有序列一样，它是可迭代的。您还可以对其进行索引、切片等操作。

您的 y 是一个列表。不幸的是，我无法弄清楚您想要实现什么目标。

我需要的是 y 与 x 相同类型，以便 outfile.write(x) = outfile.write(y)

不，您不需要让 x 和 y 成为相同类型。您确实希望能够将 y 写入二进制数据。为此，您需要创建一个 bytes 或 bytearray 对象。这非常容易；只需执行以下操作之一：

 y = bytes(y)

或者

 y = bytearray(y)

那么

outfile.write(y)

会做你想要的。

虽然，就像上面所说，我不知道你为什么要在这里创建一个列表。创建相同列表的更简单的方法是跳过所有循环，直接编写：

 y = list(x)

如果我说得清楚，你应该开始怀疑你对这里发生的事情的心理模型过于复杂，而不是太简单了。你正在想象不存在的困难 :-) 从二进制文件中读取会给你一个 bytes 对象（或者如果你想要将二进制文件读入到 bytearray 对象中，请参见文件的 .readinto() 方法），而写入二进制文件则需要提供一个 bytes 或 bytearray 对象进行写入。就是这样。

- Tim Peters

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tim Peters · Accepted Answer

它们并不完全相同 - 它们只是在应用str()（这是print()隐式执行的）后显示相同。打印它们的repr()，你会看到它们的不同。例如：

>>> x = b'ab'
>>> y = "b'ab'"
>>> print(x)
b'ab'
>>> print(y) # displays identically
b'ab'
>>> print(repr(x)) # but x is really a 2-byte bytes object
b'ab'
>>> print(repr(y)) # and y is really a 5-character string
"b'ab'"

混合字符串和字节对象没有意义（在没有显式编码的情况下 - 但是你不是要在这里编码/解码任何东西，对吧？）。如果您正在处理二进制文件，则根本不应使用字符串 - 您应该使用bytes或bytearray对象。

因此，问题实际上并不在于您的编写方式：逻辑在那之前就已经混乱了。

无法猜测您想要什么。请编辑问题以展示一个完整的可执行示例，说明您要完成的任务。我们不需要JPG文件 - 编写一些短小的、任意的二进制数据即可。例如：

dummy_jpg = b'\x01\x02\xff'