Python 3: gzip.open()和模式

13

https://docs.python.org/3/library/gzip.html

我正考虑使用 gzip.open(),但是对于 mode 参数有些困惑:

mode 参数可以是二进制模式下的 'r'、'rb'、'a'、'ab'、'w'、'wb'、'x' 或 'xb',也可以是文本模式下的 'rt'、'at'、'wt' 或 'xt'。默认为 'rb'。

那么,'w''wb' 之间有什么区别呢?

文档中都标识它们都是二进制模式,这是否意味着 'w''wb' 没有区别呢?


小小的疑问:除了 python-3.x 标签,这里是否也应该加上 python 标签呢?我向专家们提问:虽然它提到了 Python 3,但仍然是 Python,有些人可能会因此错过...我记得我曾经看到过类似的情况,但我忘记是哪一个了。 - fedepad
2个回答

23

这意味着 r 默认为 rb,如果你想要文本,则必须使用 rt 进行指定。

(与 open 行为相反,其中 r 的意思是 rt,而不是 rb


2
我本来就希望是这样的。我担心'r'是二进制读取,而'rb''r'更加二进制化。 - jeff00seattle
1
请注意,rt 模式仅适用于 gzip.open,而不适用于 gzip.GzipFile 构造函数,这让我感到困惑。 - Czechnology
是的,正如其他答案所示,open将一个TextIOWrapper对象传递给真正的gzip对象。 - Jean-François Fabre

4

正如你所说的,以及@Jean-François Fabre答案已经涵盖的内容。

我只是想展示一些代码,因为这很有趣。

让我们来看看Python库中gzip.py源代码,看看实际发生了什么。

gzip.open()可以在这里找到:https://github.com/python/cpython/blob/master/Lib/gzip.py,我在下面进行报告。

def open(filename, mode="rb", compresslevel=9,
         encoding=None, errors=None, newline=None):
    """Open a gzip-compressed file in binary or text mode.
    The filename argument can be an actual filename (a str or bytes object), or
    an existing file object to read from or write to.
    The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or "ab" for
    binary mode, or "rt", "wt", "xt" or "at" for text mode. The default mode is
    "rb", and the default compresslevel is 9.
    For binary mode, this function is equivalent to the GzipFile constructor:
    GzipFile(filename, mode, compresslevel). In this case, the encoding, errors
    and newline arguments must not be provided.
    For text mode, a GzipFile object is created, and wrapped in an
    io.TextIOWrapper instance with the specified encoding, error handling
    behavior, and line ending(s).
    """
    if "t" in mode:
        if "b" in mode:
            raise ValueError("Invalid mode: %r" % (mode,))
    else:
        if encoding is not None:
            raise ValueError("Argument 'encoding' not supported in binary mode")
        if errors is not None:
            raise ValueError("Argument 'errors' not supported in binary mode")
        if newline is not None:
            raise ValueError("Argument 'newline' not supported in binary mode")

    gz_mode = mode.replace("t", "")
    if isinstance(filename, (str, bytes, os.PathLike)):
        binary_file = GzipFile(filename, gz_mode, compresslevel)
    elif hasattr(filename, "read") or hasattr(filename, "write"):
        binary_file = GzipFile(None, gz_mode, compresslevel, filename)
    else:
        raise TypeError("filename must be a str or bytes object, or a file")

    if "t" in mode:
        return io.TextIOWrapper(binary_file, encoding, errors, newline)
    else:
        return binary_file  

我们注意到几件事情:

  • the default mode is rb as the documentation you report says
  • to open a binary file, it doesn't care whether it's "r", "rb", "w", "wb" for example.
    This we can see in the following lines:

    gz_mode = mode.replace("t", "")
    if isinstance(filename, (str, bytes, os.PathLike)):
        binary_file = GzipFile(filename, gz_mode, compresslevel)
    elif hasattr(filename, "read") or hasattr(filename, "write"):
        binary_file = GzipFile(None, gz_mode, compresslevel, filename)
    else:
        raise TypeError("filename must be a str or bytes object, or a file")
    
    if "t" in mode:
        return io.TextIOWrapper(binary_file, encoding, errors, newline)
    else:
        return binary_file
    

    basically the binary file binary_file gets built wether there's an additional b or not as gz_mode can have the b or not at this point.
    Now the class class GzipFile(_compression.BaseStream) is called to build binary_file.

在构造函数中,以下几行是非常重要的:
 if mode and ('t' in mode or 'U' in mode):
        raise ValueError("Invalid mode: {!r}".format(mode))
    if mode and 'b' not in mode:
        mode += 'b'
    if fileobj is None:
        fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
    if filename is None:
        filename = getattr(fileobj, 'name', '')
        if not isinstance(filename, (str, bytes)):
            filename = ''
    else:
        filename = os.fspath(filename)
    if mode is None:
        mode = getattr(fileobj, 'mode', 'rb')

可以清楚地看到,如果模式中不存在'b',它将会被添加。

if mode and 'b' not in mode:
            mode += 'b'  

正如已经讨论的那样,两种模式之间没有区别。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接