使用gzip压缩将pandas数据框保存到类似文件的对象中

Question

使用gzip压缩将pandas数据框保存到类似文件的对象中

3

我正在尝试将pandas DataFrame 存储到内存中的json_buffer，然后使用以下代码将文件加载到S3：

json_buffer = StringIO()
df.to_json(json_buffer, orient='records', date_format='iso', compression='gzip')
json_file_name = file_to_load.split(".")[0] + ".json"
s3_conn.put_object(Body=json_buffer.getvalue(), Bucket=s3_bucket, Key=f"{target_path}{json_file_name}")

当我尝试应用压缩时，出现以下错误：

RuntimeWarning: compression has no effect when passing a non-binary object as input.\

如何仍然应用压缩并将JSON文件保存到S3，使用.gz压缩格式？

谢谢！

- Tomer Shalhon

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tomer Shalhon · Accepted Answer

搞定了，我会分享一下我是如何使用BytesIO和gzip的：

json_buffer = BytesIO()

with gzip.GzipFile(mode='w', fileobj=json_buffer) as gz_file:
  df.to_json(gz_file, orient='records', date_format='iso')

json_file_name = file_to_load.split(".")[0] + ".json.gz"
s3_conn.put_object(Body=json_buffer.getvalue(), Bucket=s3_bucket, Key=f"{target_path}{json_file_name}")