Python中如何压缩文件为gzip格式

Question

Python中如何压缩文件为gzip格式

57

我想在Python中对文件进行gzip压缩。我尝试使用subprocess.check_call()，但它始终失败并显示错误'OSError: [Errno 2] No such file or directory'。我这里的操作有问题吗？是否有比使用subprocess.check_call更好的gzip文件方法？

from subprocess import check_call

def gZipFile(fullFilePath)
    check_call('gzip ' + fullFilePath)

感谢！

- Rinks

11

为什么不使用 http://docs.python.org/library/gzip.html ？ - Ski

1

要从目录/dir/path创建一个gzipped tarball archive.tar.gz，可以使用shutil.make_archive('archive', 'gztar', '/dir/path')。 - jfs

这个回答解决了你的问题吗？使用gzip的Python子进程 - David Streuli

9个回答

48

以二进制 (rb) 模式读取原始文件，然后使用gzip.open创建gzip文件，就可以像普通文件一样使用 writelines 写入：

import gzip

with open("path/to/file", 'rb') as orig_file:
    with gzip.open("path/to/file.gz", 'wb') as zipped_file:
        zipped_file.writelines(orig_file)

更简短的写法，可以将 with 语句合并到一行中：

with open('path/to/file', 'rb') as src, gzip.open('path/to/file.gz', 'wb') as dst:
    dst.writelines(src)

- Jace Browning

在这种情况下，我们是否必须将文件写回相同的路径？我们不能将它们暂时存储在其他地方，以便稍后保存到S3吗？ - user13067694

1

迭代二进制数据的“行”似乎不是一个好主意。 - MarcH

25

来自 Python3文档

压缩已存在的文件

import gzip
import shutil
with open('file.txt', 'rb') as f_in:
    with gzip.open('file.txt.gz', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

或者如果您讨厌嵌套的with语句

import gzip
import shutil
from contextlib import ExitStack
with ExitStack() as stack:
    f_in = stack.enter_context(open('file.txt', 'rb'))
    f_out = stack.enter_context(gzip.open('file.txt.gz', 'wb'))
    shutil.copyfileobj(f_in, f_out)

创建一个新的gzip文件：

import gzip
content = b"Lots of content here"
with gzip.open("file.txt.gz", "wb") as f:
    f.write(content)

请注意，content被转换为字节

如果您不像上面的示例那样创建内容作为字符串/字节文字的方法是另一种方式

import gzip
# get content as a string from somewhere else in the code
with gzip.open("file.txt.gz", "wb") as f:
    f.write(content.encode("utf-8"))

请查看此SO问题以讨论其他的编码方法。

- Michael Hall

1

我不知道ExitStack...有趣！ - O.rka

18

试试这个：

check_call(['gzip', fullFilePath])

根据你对这些文件数据的处理方式，Skirmantas提供的http://docs.python.org/library/gzip.html链接可能也会有所帮助。请注意页面底部的示例。如果你不需要访问数据，或者你的Python代码中还没有这些数据，执行gzip可能是最简洁的方法，这样你就不必在Python中处理数据。

- retracile

1

嗯，我不知道“干净”是否是正确的词，但它肯定是最快的方式，也是需要你方代码最少的方式。 - flying sheep

4

使用 gzip 模块：

import gzip
import os

in_file = "somefile.data"
in_data = open(in_file, "rb").read()
out_gz = "foo.gz"
gzf = gzip.open(out_gz, "wb")
gzf.write(in_data)
gzf.close()

# If you want to delete the original file after the gzip is done:
os.unlink(in_file)

你的错误：OSError: [Errno 2] No such file or directory' 告诉你文件 fullFilePath 不存在。如果你仍需要这条路，请确保该文件在你的系统上存在，并且你使用的是绝对路径而不是相对路径。

- chown

感谢大家的快速回复。这里的每个人都建议使用gzip。我也尝试过了。这是更好的方法吗？我不使用它的原因是它会保留原始文件。所以我最终会得到两个版本——常规和gzip文件。但我正在访问文件的数据。@retracile，你的修复方法很有效，非常感谢。我仍在思考是否应该使用subprocess或gzip。 - Rinks

1

@Rinks 最简单的方法是：当gzip完成时，调用os.unlink(original_File_Name)来删除您从中制作gzip的原始文件。请参阅我的编辑。 - chown

1

@Rinks：我不使用它的原因是它会保留原始文件，那么为什么你不在之后删除文件呢？ - Grzegorz Rożniecki

再次感谢。我肯定以后可以删除这个文件。我打算测试gzip和check_call两种方法几天，然后最终确定一个使用。 - Rinks

4

实际上，这方面的文档非常简单明了。

读取压缩文件的示例：

import gzip
f = gzip.open('file.txt.gz', 'rb')
file_content = f.read()
f.close()

创建压缩的GZIP文件的示例：

import gzip
content = "Lots of content here"
f = gzip.open('file.txt.gz', 'wb')
f.write(content)
f.close()

如何对现有文件进行GZIP压缩的示例:

import gzip
f_in = open('file.txt', 'rb')
f_out = gzip.open('file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

这是整个文档…… 它与gzip有关。
您可以在这里的链接中查看更多详细信息：https://docs.python.org/2/library/gzip.html。

- O.rka

1

只有当我写 content = b"Lots of content here"（注意 b）时，中间的示例才会运行。 - Morten Grum

3

import gzip

def gzip_file(src_path, dst_path):
    with open(src_path, 'rb') as src, gzip.open(dst_path, 'wb') as dst:
        for chunk in iter(lambda: src.read(4096), b""):
            dst.write(chunk)

这种解决方案的优点是保证了高效利用内存：我们不会将整个输入文件存储在内存中，而是使用4K块进行读取和转换。

- mechatroner

0

仅仅是出于完整性的考虑。这些例子中没有一个实际压缩数据。要压缩数据，需要调用gzip.compress 。下面的代码片段从pg_dump 读取并实际压缩输出。

cmd = ['pg_dump', '-d', 'mydb']
sql = subprocess.check_output(cmd)

with open('backups/{}.gz'.format('mydb'), 'wb') as zfile:
   zfile.write(gzip.compress(sql))

- user657127

0

对于Windows，可以使用子进程来运行7za实用程序：从https://www.7-zip.org/download.html下载7-Zip Extra：独立控制台版本、7z DLL、Far Manager插件。compact命令将gzip目录中的所有csv文件压缩为gzip格式。原始文件将被删除。7z选项可以在https://sevenzip.osdn.jp/chm/cmdline/index.htm找到。

import os
from pathlib import Path
import subprocess


def compact(cspath, tec, extn, prgm):  # compress each extn file in tec dir to gzip format
    xlspath = cspath / tec  # tec location
    for baself in xlspath.glob('*.' + str(extn)):  # file iteration inside directory
        source = str(baself)
        target = str(baself) + '.gz'
        try:
            subprocess.call(prgm + " a -tgzip \"" + target + "\" \"" + source + "\" -mx=5")
            os.remove(baself)  # remove src xls file
        except:
            print("Error while deleting file : ", baself)
    return 


exe = "C:\\7za\\7za.exe"  # 7za.exe (a = alone) is a standalone version of 7-Zip
csvpath = Path('C:/xml/baseline/')  # working directory
compact(csvpath, 'gzip', 'csv', exe)  # xpress each csv file in gzip dir to gzip format

- GERMAN RODRIGUEZ

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Xaerxess · Accepted Answer

有一个模块gzip。用法：

如何创建压缩的GZIP文件的示例：

import gzip
content = b"Lots of content here"
f = gzip.open('/home/joe/file.txt.gz', 'wb')
f.write(content)
f.close()

如何对现有文件进行GZIP压缩的示例：

import gzip
f_in = open('/home/joe/file.txt')
f_out = gzip.open('/home/joe/file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

编辑：

Jace Browning的回答在Python >= 2.7中使用with显然更加简洁和易读，因此我的第二段代码片段应该像这样：

import gzip
with open('/home/joe/file.txt', 'rb') as f_in, gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
    f_out.writelines(f_in)