Python 3中Pandas的to_csv到GzipFile无法工作

6
在Python 2.7(Pandas 0.22.0)中,将Pandas数据框保存为内存中的gzipped csv格式可以按以下方式完成:
from io import BytesIO
import gzip
import pandas as pd
df = pd.DataFrame.from_dict({'a': ['a', 'b', 'c']})
s = BytesIO()
f = gzip.GzipFile(fileobj=s, mode='wb', filename='file.csv')
df.to_csv(f)
s.seek(0)
content = s.getvalue()

然而在 Python 3.6 (Pandas 0.22.0) 中,调用 to_csv 会抛出错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lib/python3.6/site-packages/pandas/core/frame.py", line 1524, in to_csv
    formatter.save()
  File "lib/python3.6/site-packages/pandas/io/formats/format.py", line 1652, in save
    self._save()
  File "lib/python3.6/site-packages/pandas/io/formats/format.py", line 1740, in _save
    self._save_header()
  File "lib/python3.6/site-packages/pandas/io/formats/format.py", line 1708, in _save_header
    writer.writerow(encoded_labels)
  File "miniconda3/lib/python3.6/gzip.py", line 260, in write
    data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'

我该如何解决这个问题?我需要在to_csv正确处理它之前改变GzipFile对象吗?
澄清一下,我想在内存中创建压缩文件(即content变量),以便稍后使用Boto 3 put_object将其保存到Amazon S3。

@JohnZwinck 谢谢您的评论,我添加了更多的代码并澄清了用例 :) - Waiski
2个回答

1
您可以利用StringIO
from io import StringIO
buf = StringIO()
df.to_csv(buf)
f = gzip.GzipFile(fileobj=s, mode='wb', filename='file.csv')
f.write(buf.getvalue().encode())
f.flush()

请注意增加的f.flush() - 根据我的经验,没有这行代码,GzipFile在某些情况下可能会随机未刷新数据,导致归档文件损坏。
或者,基于你的代码,以下是完整的示例:
from io import BytesIO
import gzip
import pandas as pd
from io import StringIO
df = pd.DataFrame.from_dict({'a': ['a', 'b', 'c']})
s = BytesIO()
buf = StringIO()
f = gzip.GzipFile(fileobj=s, mode='wb', filename='file.csv')
df.to_csv(buf)
f.write(buf.getvalue().encode())
f.flush()
s.seek(0)
content = s.getvalue()

0

Roland Pihlakas的答案是正确的,但gzip文件不完整(尽管进行了flush)。在调用bytesio.getvalue()之前需要将其关闭。修改后的代码:

with BytesIO() as b:
    with StringIO() as s, GzipFile(fileobj=b, mode='wb') as gz:
        df.to_csv(s, encoding="utf-8")
        gz.write(s.getvalue().encode())
        gz.flush()
    b.seek(0)
    csv_bytes = b.getvalue()

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接