从 http.client 调用中解压（Gzip）响应的块。

Question

从 http.client 调用中解压（Gzip）响应的块。

5

我有以下代码，用于试图对从 http.client.HTTPSConnection 发送的 API get 请求的响应进行分块处理（请注意，响应是 gzip 编码的）:

    connection = http.client.HTTPSConnection(api, context = ssl._create_unverified_context())
    connection.request('GET', api_url, headers = auth)
    response = connection.getresponse()
    while chunk := response.read(20):
        data = gzip.decompress(chunk)
        data = json.loads(chunk)
        print(data)

这总是会报错，说它不是一个gzip文件（b'\xe5\x9d'）。我不确定我如何分块数据并仍然实现我在这里尝试做的事情。基本上，我正在分块以便我不必将整个响应加载到内存中。

请注意，我不能使用任何其他库，例如requests、urllib等。

- qwerty

2个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ye olde noobe · Answer 1

这可能是因为你收到的响应确实不是一个gzip文件。

我注意到在你的代码中，你传递了一个名为auth的变量。通常情况下，如果你没有在请求头中指定可以接受压缩响应，服务器不会发送压缩响应。如果你的头部只有类似于你的变量名所示的与认证相关的键，那么你将不会收到gzip响应。首先，请确保在你的头部中有'Accept-Encoding': 'gzip'。

未来，你将面临另一个问题：

基本上，我正在分块处理，这样我就不必将整个响应加载到内存中。

gzip.decompress需要一个完整的文件，因此在执行之前，你需要重构并完全加载它到内存中，这将破坏分块响应的整个目的。尝试使用gzip.decompress解压缩gzip的一部分很可能会给你一个EOFError，并显示类似于Compressed file ended before the end-of-stream marker was reached的内容。

我不知道你是否可以直接使用gzip库管理它，但我知道如何使用zlib来做到这一点。此外，您还需要将chunk转换为类似文件的对象，可以使用io.BytesIO完成。我看到您对库有非常严格的限制，但是zlib和io是Python默认的一部分，所以希望您有它们可用。以下是您代码的修改版本，应该能够帮助您继续：

import http
import ssl
import gzip
import zlib
from io import BytesIO

# your variables here
api = 'your_api_host'
api_url = 'your_api_endpoint'
auth = {'AuhtKeys': 'auth_values'}

# add the gzip header
auth['Accept-Encoding'] = 'gzip'

# prepare decompressing object
decompressor = zlib.decompressobj(16 + zlib.MAX_WBITS)

connection = http.client.HTTPSConnection(api, context = ssl._create_unverified_context())
connection.request('GET', api_url, headers = auth)
response = connection.getresponse()

while chunk := response.read(20):
    data = decompressor.decompress(BytesIO(chunk).read())
    print(data)

- asynts · Answer 2

问题在于gzip.decompress期望一个完整的文件，你不能只提供一个块给它，因为解压缩过程中deflate算法依赖于先前的数据。算法的整个重点在于它能够重复之前已经看到的东西，因此需要所有数据。

然而，deflate只关心最后32 KiB左右的数据。因此，可以流式解压缩这样的文件而不需要太多内存。但这不是你需要自己实现的，Python提供了gzip.GzipFile类，可以用来包装文件句柄并像普通文件一样使用：

import io
import gzip

# Create a file for testing.
# In your case you can just use the response object you get.
file_uncompressed = ""
for line_index in range(10000):
    file_uncompressed += f"This is line {line_index}.\n"
file_compressed = gzip.compress(file_uncompressed.encode())
file_handle = io.BytesIO(file_compressed)

# This library does all the heavy lifting 
gzip_file = gzip.GzipFile(fileobj=file_handle)

while chunk := gzip_file.read(1024):
    print(chunk)