zlib.error: 解压缩时出错，错误代码为-3：头部校验不正确。

Question

zlib.error: 解压缩时出错，错误代码为-3：头部校验不正确。

73

我有一个gzip文件，我想通过以下Python代码来读取它：

import zlib

do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)

它会抛出这个错误：

zlib.error: Error -3 while decompressing: incorrect header check

我该如何克服它？

- VarunVyas

9个回答

4

更新: dnozay 的答案解释了问题并应该是被采纳的答案。

尝试使用gzip模块，下面的代码直接来自Python文档。

import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()

- Dave Bacher

相同的错误出现：回溯（最近的调用最先）：文件“<stdin>”，第1行，在<module>中文件“/usr/lib/python2.6/gzip.py”，第212行，读取 self._read(readsize) 文件“/usr/lib/python2.6/gzip.py”，第271行，_read uncompress = self.decompress.decompress(buf) zlib.error：解压缩时出错-3：无效的代码长度设置 - VarunVyas

@VarunVyas，抱歉，我无法重现您的错误。这可能与您的输入数据有关。您的输入文件是使用gzip生成的吗？从命令行运行gunzip是否可以正确解压缩它？ - Dave Bacher

3

我刚刚解决了解压缩gzipped数据时出现的“incorrect header check”问题。

您需要在调用inflateInit2（使用2版本）时设置-WindowBits => WANT_GZIP。

是的，这可能非常令人沮丧。通常浅显地阅读文档会将Zlib呈现为Gzip压缩的API，但默认情况下（不使用gz*方法），它不会创建或解压缩Gzip格式。您必须发送此非常不显眼的标志。

- user2475290

3

这并没有回答原问题，但可能会帮助到其他在此处遇到类似问题的人。 zlib.error: Error -3 while decompressing: incorrect header check 也会在下面的示例中出现：

b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde'))
encoded_bytes_representation = str(b64_encoded_bytes)  # this the cause
zlib.decompress(base64.b64decode(encoded_bytes_representation))

这个示例是我在一些遗留的Django代码中遇到的最简化重现，其中Base64编码的字节（来自HTTP POST）被存储在一个Django CharField（而不是BinaryField）中。

从数据库读取CharField值时，会对该值调用str()，没有明确指定encoding，可以在Django源代码中看到。 str() 文档中写道：

如果未给出编码或错误信息，则str(object)返回对象的“非正式”或可打印的字符串表示形式str()。对于字符串对象，这就是字符串本身。如果对象没有str()方法，则str()会退回到返回repr(object)。

因此，在示例中，我们无意中对"b'eJxLTEpOSQUABcgB8A=='"进行了base64解码，而不是b'eJxLTEpOSQUABcgB8A=='。

如果使用了显式的编码，例如str(b64_encoded_bytes,'utf-8')，则示例中的zlib解压缩将成功。

注意：Django特定的问题：仅在从数据库检索值时才会出现此问题。例如下面的测试通过（在Django 3.0.3中）。

class MyModelTests(TestCase):
    def test_bytes(self):
        my_model = MyModel.objects.create(data=b'abcde')
        self.assertIsInstance(my_model.data, bytes)  # issue does not arise
        my_model.refresh_from_db()
        self.assertIsInstance(my_model.data, str)  # issue does arise

其中 MyModel 是

class MyModel(models.Model):
    data = models.CharField(max_length=100)

- djvg

2

要解压存在内存中的不完整的gzipped字节，dnozay的回答很有用，但它缺少我发现必要的zlib.decompressobj调用：

最初的回答：

incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content)

请注意，zlib.MAX_WBITS | 16 是 15 | 16，结果是31。关于的背景，请参见zlib.decompress。

致谢：Yann Vernier的回答提到了zlib.decompressobj调用。

- Asclepius

1

有趣的是，我在尝试使用Python与Stack Overflow API进行工作时遇到了这个错误。

我设法通过从gzip目录中使用GzipFile对象来使其正常工作，大致如下：

import gzip

gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))

file_contents = gzip_file.read()

- Paul D. Waite

1

我的任务是解压存储在Bullhorn数据库中的电子邮件消息。代码片段如下：

import pyodbc
import zlib

cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')



 for msg in cursor.fetchall():
    #magic in the second parameter, use negative value for deflate format
    decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)

- Yury Bondarau

0

如果您使用Node.js，请尝试使用fflate软件包，对于gzip对我有效。

const fflate = require('fflate');


    const decompressedData = await new Promise((resolve, reject) => {
           fflate.gunzip(buffer, (error, result) => {
                       if (error) {
                       reject(error);
                       } else {
                       resolve(result);
                      }
                   });
                });
           xml = Buffer.from(decompressedData).toString('UTF-8');

- Kar

-3

只需添加头部信息 'Accept-Encoding': 'identity'

import requests

requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'})

https://github.com/requests/requests/issues/3849

- Barny

4

你对于一个关于解压缩的问题的回答是：不要一开始就压缩它？ - Ryder Brooks

1

服务器并不总是尊重所声明的头部，因此这种方法并不能始终可靠地运行。 - Asclepius

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- dnozay · Accepted Answer

您有这个错误：

zlib.error: Error -3 while decompressing: incorrect header check

由于您尝试检查不存在的标头，例如，您的数据遵循RFC 1951（deflate压缩格式），而不是RFC 1950（zlib压缩格式）或RFC 1952（gzip压缩格式），这很可能是最有可能的原因。

选择windowBits

但是，zlib可以解压缩所有这些格式：

要（解）压缩deflate格式，请使用wbits = -zlib.MAX_WBITS
要（解）压缩zlib格式，请使用wbits = zlib.MAX_WBITS
要（解）压缩gzip格式，请使用wbits = zlib.MAX_WBITS | 16

请参阅http://www.zlib.net/manual.html#Advanced中的文档（inflateInit2部分）

示例

测试数据：

>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>> 
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>

明显的 zlib 测试：

>>> zlib.decompress(zlib_data)
'test'

测试 deflate：

>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'

测试 gzip：

>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'

数据也与 gzip 模块兼容：

>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data)  # io.BytesIO for Python 3
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()

自动标头检测（zlib或gzip）

将windowBits添加32将触发标头检测。

>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'

使用`gzip`代替

或者你可以忽略zlib直接使用gzip模块；但是请记住，在幕后, gzip 使用了 zlib��

fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()

zlib.error: 解压缩时出错，错误代码为-3：头部校验不正确。

选择windowBits

示例

使用gzip代替

使用`gzip`代替