Python中的`file.read()`方法返回的结果类型如何确定？

Question

Python中的`file.read()`方法返回的结果类型如何确定？

3

我有一些在Python下操作file对象的代码。

随着Python3的字符串/字节变革，如果以二进制模式打开file，那么file.read()将返回bytes。相反，如果以文本模式打开file，则file.read()将返回str。

在我的代码中，多次调用file.read()，因此每次调用file.read()都检查结果类型是不切实际的，例如：

def foo(file_obj):
    while True:
        data = file.read(1)
        if not data:
            break
        if isinstance(data, bytes):
            # do something for bytes
            ...
        else:  # isinstance(data, str)
            # do something for str
            ...

我希望有一些可靠的方法来检查 file.read() 的结果，例如：

def foo(file_obj):
    if is_binary_file(file_obj):
        # do something for bytes
        while True:
            data = file.read(1)
            if not data:
                break
            ...
    else:
        # do something for str
        while True:
            data = file.read(1)
            if not data:
                break
            ...

一种可能的方法是检查file_obj.mode，例如：

import io


def is_binary_file(file_obj):
    return 'b' in file_obj.mode


print(is_binary_file(open('test_file', 'w')))
# False
print(is_binary_file(open('test_file', 'wb')))
# True
print(is_binary_file(io.StringIO('ciao')))
# AttributeError: '_io.StringIO' object has no attribute 'mode'
print(is_binary_file(io.BytesIO(b'ciao')))
# AttributeError: '_io.BytesIO' object has no attribute 'mode'

如果传入的是io中的对象，比如io.StringIO()和io.BytesIO()，那么方法将会失败。

另外一种方法，可以适用于io对象，就是检查encoding属性，例如：

import io


def is_binary_file(file_obj):
    return not hasattr(file_obj, 'encoding')


print(is_binary_file(open('test_file', 'w')))
# False
print(is_binary_file(open('test_file', 'wb')))
# True
print(is_binary_file(io.StringIO('ciao')))
# False 
print(is_binary_file(io.BytesIO(b'ciao')))
# True

有更简洁的方法来执行这个检查吗？

- norok2

你也可以尝试在[codereview.se]堆栈网站上提问，他们专门改进工作代码。 - G. Anderson

2个回答

0

经过更多的功课，我可能可以回答自己的问题。

首先，一个常规的评论：检查整个 API 是否存在/缺少属性/方法并不是一个好主意，因为这会导致更复杂、仍然相对不安全的代码。

遵循EAFP/duck-typing 的思路，检查特定方法可能是可以接受的，但它应该是随后在代码中使用的方法。

file.read()（甚至更多的是 file.write()）的问题在于它带有副作用，使得尝试使用它并查看发生了什么变得不切实际。

对于这种特定情况，在仍然遵循鸭子类型思路的情况下，可以利用 read() 的第一个参数可以设置为 0 这一事实。这实际上不会从缓冲区中读取任何内容（并且不会改变 file.tell() 的结果），但它将给出一个空的 str 或 bytes。因此，可以写出类似以下的代码：

def is_reading_bytes(file_obj):
    return isinstance(file_obj.read(0), bytes)


print(is_reading_bytes(open('test_file', 'r')))
# False
print(is_reading_bytes(open('test_file', 'rb')))
# True
print(is_reading_bytes(io.StringIO('ciao')))
# False 
print(is_reading_bytes(io.BytesIO(b'ciao')))
# True

同样地，你可以尝试使用空的bytes字符串b''来调用write()方法。

def is_writing_bytes(file_obj)
    try:
        file_obj.write(b'')
    except TypeError:
        return False
    else:
        return True


print(is_writing_bytes(open('test_file', 'w')))
# False
print(is_writing_bytes(open('test_file', 'wb')))
# True
print(is_writing_bytes(io.StringIO('ciao')))
# False 
print(is_writing_bytes(io.BytesIO(b'ciao')))
# True

请注意，这些方法不会检查可读性/可写性。

最后，我们可以通过检查类文件对象API来实现适当的类型检查方法。在Python中，类文件对象必须支持io模块中描述的API。文档中提到，对于以文本模式打开的文件，使用TextIOBase，而对于以二进制模式打开的文件，则使用BufferedIOBase（或未缓冲流的RawIOBase）。类层次结构概述表明它们都是从IOBase派生的子类。因此，以下代码可以解决问题（请记住isinstance()也会检查子类）：

def is_binary_file(file_obj):
    return isinstance(file_obj, io.IOBase) and not isinstance(file_obj, io.TextIOBase)


print(is_binary_file(open('test_file', 'w')))
# False
print(is_binary_file(open('test_file', 'wb')))
# True
print(is_binary_file(open('test_file', 'r')))
# False
print(is_binary_file(open('test_file', 'rb')))
# True
print(is_binary_file(io.StringIO('ciao')))
# False 
print(is_binary_file(io.BytesIO(b'ciao')))
# True

请注意，文档明确指出TextIOBase将具有一个encoding参数，而对于二进制文件对象则不需要（即不存在）。因此，在当前的API下，检查encoding属性可能是一种方便的hack，用于检查标准类的文件对象是否为二进制文件对象，假设被测试的对象是类似文件的。检查mode属性仅适用于FileIO对象，而mode属性不是IOBase / RawIOBase接口的一部分，这就是为什么它不能在io.StringIO() / is.BytesIO()对象上工作的原因。

- norok2

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Iguananaut · Accepted Answer

我有一个版本的Astropy（用于Python 3，但是如果有需要，可以在旧版本的Astropy中找到Python 2版本）。

虽然它不太美观，但对于大多数情况而言，它足够可靠（我删除了检查.binary属性的部分，因为这仅适用于Astropy中的一个类）。

def fileobj_is_binary(f):
    """
    Returns True if the give file or file-like object has a file open in binary
    mode.  When in doubt, returns True by default.
    """

    if isinstance(f, io.TextIOBase):
        return False

    mode = fileobj_mode(f)
    if mode:
        return 'b' in mode
    else:
        return True

其中fileobj_mode表示：

def fileobj_mode(f):
    """
    Returns the 'mode' string of a file-like object if such a thing exists.
    Otherwise returns None.
    """

    # Go from most to least specific--for example gzip objects have a 'mode'
    # attribute, but it's not analogous to the file.mode attribute

    # gzip.GzipFile -like
    if hasattr(f, 'fileobj') and hasattr(f.fileobj, 'mode'):
        fileobj = f.fileobj

    # astropy.io.fits._File -like, doesn't need additional checks because it's
    # already validated
    elif hasattr(f, 'fileobj_mode'):
        return f.fileobj_mode

    # PIL-Image -like investigate the fp (filebuffer)
    elif hasattr(f, 'fp') and hasattr(f.fp, 'mode'):
        fileobj = f.fp

    # FILEIO -like (normal open(...)), keep as is.
    elif hasattr(f, 'mode'):
        fileobj = f

    # Doesn't look like a file-like object, for example strings, urls or paths.
    else:
        return None

    return _fileobj_normalize_mode(fileobj)


def _fileobj_normalize_mode(f):
    """Takes care of some corner cases in Python where the mode string
    is either oddly formatted or does not truly represent the file mode.
    """
    mode = f.mode

    # Special case: Gzip modes:
    if isinstance(f, gzip.GzipFile):
        # GzipFiles can be either readonly or writeonly
        if mode == gzip.READ:
            return 'rb'
        elif mode == gzip.WRITE:
            return 'wb'
        else:
            return None  # This shouldn't happen?

    # Sometimes Python can produce modes like 'r+b' which will be normalized
    # here to 'rb+'
    if '+' in mode:
        mode = mode.replace('+', '')
        mode += '+'

    return mode

你可能还想为io.BytesIO添加一个特殊情况。虽然不太美观，但对于大多数情况都有效。如果有更简单的方法就好了。最初的回答。