Python读/写与shutil复制的区别

Question

Python读/写与shutil复制的区别

4

我需要保存上传到我的服务器的文件（最大文件大小为10MB），并找到了这个答案，它完美地工作。然而，我想知道使用shutil模块的意义是什么，以及这两种方法之间有什么区别：

file_location = f"files/{uploaded_file.filename}"
with open(file_location, "wb+") as file_object:
    file_object.write(uploaded_file.file.read())

并且这个：

import shutil

file_location = f"files/{uploaded_file.filename}"
with open(file_location, "wb+") as file_object:
    shutil.copyfileobj(uploaded_file.file, file_object)

在我的编程经验中，我多次遇到了shutil模块，但仍然无法确定它相对于read()和write()方法的优势。

- salius

2个回答

4

你的方法需要整个文件存在内存中。shutil 分块复制，因此可以复制大于内存的文件。此外，shutil 还有按名称复制文件的例程，因此你根本不需要打开它们，并且它可以保留权限、所有权和创建/修改/访问时间戳。

- Tim Roberts

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Chris · Accepted Answer

我想强调一下关于OP的问题和@Tim Roberts目前被接受的答案的几点：

"shutil copies in chunks so you can copy files larger than memory". You can also copy a file in chunks using read()—please have a look at the short example below, as well as this and this answer for more details—just like you can load the whole file into memory using shutil.copyfileobj(), by giving a negative length value.
```
with open(uploaded_file.filename, 'wb') as f:
    while contents := uploaded_file.file.read(1024 * 1024):  # adjust the chunk size as desired
        f.write(contents)
```
Under the hood, copyfileob() uses a very similar approach to the above, utilising read() and write() methods of file objects; hence, it would make little difference, if you used one over the other. The source code of copyfileob() can be seen below. The default buffer size, i.e., COPY_BUFSIZE below, is set to 1MB (1024 *1024 bytes), if it is running on Wnidows, or 64KB (64 * 1024 bytes) on other platforms (see here).
```
def copyfileobj(fsrc, fdst, length=0):
    """copy data from file-like object fsrc to file-like object fdst"""
    if not length:
        length = COPY_BUFSIZE
    # Localize variable access to minimize overhead.
    fsrc_read = fsrc.read
    fdst_write = fdst.write
    while True:
        buf = fsrc_read(length)
        if not buf:
            break
        fdst_write(buf)
```
"shutil has routines to copy files by name so you don't have to open them at all..." Since OP seems to be using FastAPI framework (which is actually Starlette underneath), UploadFile exposes an actual Python SpooledTemporaryFile (a file-like object) that you can get using the .file attribute (source code can be found here). When FastAPI/Starlette creates a new instance of UploadFile, it already creates the SpooledTemporaryFile behind the scenes, which remains open. Hence, since you are dealing with a temporary file that has no visible name in the file system—that would otherwise allow you to copy the contents without opening the file using shutil—and which is already open, it would make no difference using either read() or copyfileobj().
"it can preserve the permissions, ownership, and creation/modification/access timestamps." Even though this is about saving a file uploaded through a web framework—and hence, most of these metadata wouldn't be transfered along with the file—as per the documentation, the above statement is not entirely true:

Warning: Even the higher-level file copying functions (shutil.copy(), shutil.copy2()) cannot copy all file metadata.

On POSIX platforms, this means that file owner and group are lost as well as ACLs. On Mac OS, the resource fork and other metadata are not used. This means that resources will be lost and file type and creator codes will not be correct. On Windows, file owners, ACLs and alternate data streams are not copied.

话虽如此，使用 copyfileobj() 并没有错。相反地，如果你正在处理大文件，并且希望避免将整个文件加载到内存中——因为可能没有足够的 RAM 容纳所有数据——并且你宁愿使用 copyfileobj() 而不是使用类似于使用 read() 方法的解决方案（如上述第1点所述），那么使用 shutil.copyfileobj(fsrc, fdst) 是完全可以的。此外，copyfileobj() 已经被提供（自 Python 3.8 开始）作为平台相关高效的复制操作的替代品。你可以通过调整 copyfileobj() 中的 length 参数来更改默认缓冲区大小。

注意

如果在FastAPI的同步端点中使用copyfileobj()，那么完全没有问题，因为FastAPI中的普通def端点在外部线程池中运行，然后等待结果，而不是直接调用（因为这会阻塞服务器）。另一方面，async def端点在主（单）线程上运行，因此调用执行阻塞I/O操作的方法（如源代码所示中的copyfileobj()）将导致整个服务器被阻塞（有关def与async def的更多信息，请参见this answer）。因此，如果您要从async def端点内部调用copyfileobj()，则应确保在单独的线程中运行此操作，以及所有其他文件操作，例如open()和close()，以确保主线程（其中协程运行）不会被阻塞。您可以使用Starlette的run_in_threadpool()来实现这一点，当您调用UploadFile对象的async方法时，FastAPI也会在内部使用它，如此处所示。例如：

await run_in_threadpool(shutil.copyfileobj, fsrc, fdst)

如需更多细节和代码示例，请查看this answer。