如何使用Python将整个文件夹复制到现有目录中？

Question

如何使用Python将整个文件夹复制到现有目录中？

369

在包含名为 bar（包含一个或多个文件）和名为 baz（也包含一个或多个文件）的目录的目录中运行以下代码。确保没有名为foo的目录。

import shutil
shutil.copytree('bar', 'foo')
shutil.copytree('baz', 'foo')

它将失败并显示：

$ python copytree_test.py 
Traceback (most recent call last):
  File "copytree_test.py", line 5, in <module>
    shutil.copytree('baz', 'foo')
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/shutil.py", line 110, in copytree
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/os.py", line 172, in makedirs
OSError: [Errno 17] File exists: 'foo'

我希望这能像我手动输入以下内容一样正常工作：

$ mkdir foo
$ cp bar/* foo/
$ cp baz/* foo/

我已经用 shutil.copytree() 把 'bar' 的内容复制到了 'foo'，那我是否需要使用 shutil.copy() 来将 baz 中的每个文件都复制到 foo 中？还是有更简单/更好的方法？

- Daryl Spitzer

3

FYI：这里是原始的copytree函数，只需复制并打补丁即可 :) - schlamar

4

有一项有关更改shutil.copytree()行为的Python问题（http://bugs.python.org/issue20849），允许向现有目录中写入内容，但需要就某些行为细节达成一致。 - Nick Chammas

11

请注意，上述的增强请求已经在Python 3.8中得到实现：https://docs.python.org/3.8/whatsnew/3.8.html#shutil - ncoghlan

15个回答

257

标准的shutil.copytree存在这种限制，这似乎是任意和令人烦恼的。解决方法：

import os, shutil
def copytree(src, dst, symlinks=False, ignore=None):
    for item in os.listdir(src):
        s = os.path.join(src, item)
        d = os.path.join(dst, item)
        if os.path.isdir(s):
            shutil.copytree(s, d, symlinks, ignore)
        else:
            shutil.copy2(s, d)

要注意的是，它并不完全符合标准的copytree：

对于src树的根目录，它不会遵守symlinks和ignore参数；
在src的根级别出现错误时，它不会引发shutil.Error；
在复制子树期间出现错误时，它将为该子树引发shutil.Error，而不是尝试复制其他子树并引发一个单一的组合shutil.Error。

- atzz

60

谢谢！同意这似乎完全是武断的！shutil.copytree在开始时执行了os.makedirs(dst)。代码的任何部分实际上都不会对已存在的目录有问题。这需要改变。至少提供一个exist_ok=False参数给调用。 - cfi

6

这是一条不错的回答，但下面的 Mital Vora 的回答也值得一看。他们使用递归调用 copytree 而不是调用 shutil.copytree()，因为否则将会出现同样的问题。可能要考虑合并回答或更新到 Mital Vora 的回答。 - PJeffes

6

如果给定了一个包含目标文件夹中非空文件夹的路径，则此方法将失败。也许有人可以使用尾递归解决，但是这里提供了一种修改后的代码：

def copyTree(src, dst, symlinks=False, ignore=None):
    for item in os.listdir(src):
        s = os.path.join(src, item)
        d = os.path.join(dst, item)
        if os.path.isdir(s):
            if os.path.isdir(d):
                copyTree(s, d, symlinks, ignore)
            else:
                shutil.copytree(s, d, symlinks, ignore)
        else:
            shutil.copy2(s, d)

- Sojurn

9

呃，超级烦人。已经过去4年了，shutil.copytree仍然有这个愚蠢的限制。 :-( - antred

6

“distutils.dir_util.copy_tree()”也位于Python标准库中，但它没有这样的限制，实际上表现如预期。鉴于此，没有必要尝试撤销自己（通常是错误的）的实现。 Brendan Abel 的回答应该成为现在的被采纳的解决方案。” - Cecil Curry

显示剩余8条评论

150

Python 3.8 引入了 dirs_exist_ok 参数到 shutil.copytree:

递归地将以 src 为根的整个目录树复制到名为 dst 的目录中，并返回目标目录。 dirs_exist_ok 参数用于控制在 dst 或任何缺少的父目录已经存在时是否引发异常。

因此，使用 Python 3.8+，以下代码应该可以正常运行:

import shutil

shutil.copytree('bar', 'foo')  # Will fail if `foo` exists
shutil.copytree('baz', 'foo', dirs_exist_ok=True)  # Fine

- Chris

在copytree中，dirs_exist_ok默认为False，第一次复制尝试不会失败吗？ - Jay

2

@Jay，仅当目录已经存在时。我在第一次调用中省略了dirs_exist_ok以说明差异（因为在OP的示例中目录尚不存在），但是如果您想要，当然可以使用它。 - Chris

1

谢谢，如果您在第一份副本附近添加注释，我认为它会更清晰 :) - Jay

这也适用于pathlib.Path对象作为src和dst参数 :) 另一方面，distutils.dir_util的copy_tree需要转换为字符串。 - FObersteiner

这确实正确地将bar的内容复制到了foo中，就像OP建议的那样。是否可能将整个bar目录复制到foo中，以便您最终获得类似于bar/foo的结构？ - pcko1

2

@pcko1，假设您的意思是foo/bar/而不是bar/foo/，请尝试使用shutil.copytree("bar", "foo/bar", dirs_exist_ok=True)。如果您不想硬编码/目录分隔符，也可以使用pathlib.Path对象。 - Chris

73

对 atzz 的函数进行轻微改进，原函数总是尝试将文件从源复制到目标。

def copytree(src, dst, symlinks=False, ignore=None):
    if not os.path.exists(dst):
        os.makedirs(dst)
    for item in os.listdir(src):
        s = os.path.join(src, item)
        d = os.path.join(dst, item)
        if os.path.isdir(s):
            copytree(s, d, symlinks, ignore)
        else:
            if not os.path.exists(d) or os.stat(s).st_mtime - os.stat(d).st_mtime > 1:
                shutil.copy2(s, d)

在我的上述实现中

如果目录不存在，创建输出目录
通过递归调用自己实现复制目录。
当我们真正要复制文件时，我检查文件是否已修改，只有在这种情况下我们才应该复制。

我将此函数与scons构建一起使用。每次编译时，它对我很有帮助，因为我可能不需要复制整个文件集..而只需要复制已更改的文件。

- Mital Vora

9

很好，除了你有符号链接和忽略作为参数，但它们被忽略了。 - Matthew Alpert

1

值得注意的是，在FAT文件系统上，st_mtime的粒度可能只有2秒钟。在更新频繁的情况下使用此代码，您可能会发现覆盖不会发生。请参考http://docs.python.org/2/library/os.html。 - dgh

倒数第二行有一个错误，应该是：if not os.path.exists(d) or os.stat(s).st_mtime - os.stat(d).st_mtime > 1: - mpderbec

1

copytree() 的第三个和第四个参数的目的是什么？这两个参数 -- symlinks, ignore -- 从未被使用过，所以它们可以省略，对吗？ - Mr-IDE

请查看shutil.copytree文档https://docs.python.org/3/library/shutil.html，了解有关`ignore`参数的更多详细信息。 - Mital Vora

这个函数的递归性质在Python中可能会有问题，因为它并不适合进行深度递归调用。这取决于目录的深度，如果是性能问题的话，我猜是有可能的。 - undefined

41

受atzz和Mital Vora启发的合并方法：

#!/usr/bin/python
import os
import shutil
import stat
def copytree(src, dst, symlinks = False, ignore = None):
  if not os.path.exists(dst):
    os.makedirs(dst)
    shutil.copystat(src, dst)
  lst = os.listdir(src)
  if ignore:
    excl = ignore(src, lst)
    lst = [x for x in lst if x not in excl]
  for item in lst:
    s = os.path.join(src, item)
    d = os.path.join(dst, item)
    if symlinks and os.path.islink(s):
      if os.path.lexists(d):
        os.remove(d)
      os.symlink(os.readlink(s), d)
      try:
        st = os.lstat(s)
        mode = stat.S_IMODE(st.st_mode)
        os.lchmod(d, mode)
      except:
        pass # lchmod not available
    elif os.path.isdir(s):
      copytree(s, d, symlinks, ignore)
    else:
      shutil.copy2(s, d)

具有与shutil.copytree相同的行为，包括symlinks和ignore参数
如果目标路径不存在则创建目录结构
如果dst已经存在，也不会失败

- Cyrille Pontvieux

这比原始解决方案快得多，特别是当目录嵌套很深时。谢谢。 - Kashif

你在代码的其他地方也定义了一个名为“ignore”的函数吗？ - KenV99

在调用copytree函数之前，您可以使用任何名称定义任何函数。该函数（也可以是lambda表达式）接受两个参数：目录名称和其中的文件，它应该返回一个可迭代的忽略文件。 - Cyrille Pontvieux

[x for x in lst if x not in excl] 这个代码不同于 copytree，后者使用 glob 模式匹配。详见：https://en.wikipedia.org/wiki/Glob_(programming) - Konstantin Schubert

2

这很棒。在上面的答案中，忽略没有被正确地利用。 - Keith Holliday

显示剩余2条评论

8

文档明确指出目标目录不应该存在：

目标目录由 dst 指定，必须不存在；它将被创建，以及缺少的父目录。

我认为你最好是用 os.walk 遍历第二个及其后续目录，使用 copy2 复制目录和文件，并对目录执行额外的 copystat。毕竟，这正是文档中所解释的 copytree 的功能。或者你可以对每个目录/文件进行 copy 和 copystat，并使用 os.listdir 代替 os.walk。

- SilentGhost

1

这是受atzz提供的最佳答案启发，我只是添加了替换文件/文件夹的逻辑。因此它实际上并不合并，而是删除现有的文件/文件夹并复制新的文件/文件夹：

import shutil
import os
def copytree(src, dst, symlinks=False, ignore=None):
    for item in os.listdir(src):
        s = os.path.join(src, item)
        d = os.path.join(dst, item)
        if os.path.exists(d):
            try:
                shutil.rmtree(d)
            except Exception as e:
                print e
                os.unlink(d)
        if os.path.isdir(s):
            shutil.copytree(s, d, symlinks, ignore)
        else:
            shutil.copy2(s, d)
    #shutil.rmtree(src)

取消注释 rmtree 以将其转换为移动函数。

- radtek

1

这是我解决问题的方案。我修改了copytree的源代码以保留原始功能，但现在当目录已经存在时不会出现错误。我还更改了它，使其不覆盖现有文件，而是保持两个副本，一个带有修改后的名称，因为这对我的应用程序很重要。

import shutil
import os


def _copytree(src, dst, symlinks=False, ignore=None):
    """
    This is an improved version of shutil.copytree which allows writing to
    existing folders and does not overwrite existing files but instead appends
    a ~1 to the file name and adds it to the destination path.
    """

    names = os.listdir(src)
    if ignore is not None:
        ignored_names = ignore(src, names)
    else:
        ignored_names = set()

    if not os.path.exists(dst):
        os.makedirs(dst)
        shutil.copystat(src, dst)
    errors = []
    for name in names:
        if name in ignored_names:
            continue
        srcname = os.path.join(src, name)
        dstname = os.path.join(dst, name)
        i = 1
        while os.path.exists(dstname) and not os.path.isdir(dstname):
            parts = name.split('.')
            file_name = ''
            file_extension = parts[-1]
            # make a new file name inserting ~1 between name and extension
            for j in range(len(parts)-1):
                file_name += parts[j]
                if j < len(parts)-2:
                    file_name += '.'
            suffix = file_name + '~' + str(i) + '.' + file_extension
            dstname = os.path.join(dst, suffix)
            i+=1
        try:
            if symlinks and os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
            elif os.path.isdir(srcname):
                _copytree(srcname, dstname, symlinks, ignore)
            else:
                shutil.copy2(srcname, dstname)
        except (IOError, os.error) as why:
            errors.append((srcname, dstname, str(why)))
        # catch the Error from the recursive copytree so that we can
        # continue with other files
        except BaseException as err:
            errors.extend(err.args[0])
    try:
        shutil.copystat(src, dst)
    except WindowsError:
        # can't copy file access times on Windows
        pass
    except OSError as why:
        errors.extend((src, dst, str(why)))
    if errors:
        raise BaseException(errors)

- James

这个函数的递归性质在Python中可能会有问题，因为它并不适合进行深层次的递归调用。这取决于目录的深度，如果是性能问题的话，我猜是有可能的。 - undefined

这在技术上是正确的，尽管在实践中，我认为你不太可能遇到这种情况。Python的最大递归深度默认为1000，所以你需要有一个深度为1000的目录结构。（请注意，你可以在第一级拥有尽可能多的目录，对于Python来说，所有这些调用都将计为递归深度1）。文件路径限制为4k个字符，因此要创建一个深度为1000的有意义的文件结构可能会相当具有挑战性，因为你只能使用三个字符或更短的名称。 - undefined

这在现实中可能并不经常发生，而且仅限于某些系统功能的4k，但许多常见的文件系统没有文件路径，并且有递归子目录限制。请参阅https://askubuntu.com/questions/859945/what-is-the-maximum-length-of-a-file-path-in-ubuntu和https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits。 - undefined

1

这是一个期望输入为 pathlib.Path 的版本。

# Recusively copies the content of the directory src to the directory dst.
# If dst doesn't exist, it is created, together with all missing parent directories.
# If a file from src already exists in dst, the file in dst is overwritten.
# Files already existing in dst which don't exist in src are preserved.
# Symlinks inside src are copied as symlinks, they are not resolved before copying.
#
def copy_dir(src, dst):
    dst.mkdir(parents=True, exist_ok=True)
    for item in os.listdir(src):
        s = src / item
        d = dst / item
        if s.is_dir():
            copy_dir(s, d)
        else:
            shutil.copy2(str(s), str(d))

请注意，此函数需要 Python 3.6 及以上版本，在 Python 中，os.listdir() 支持路径类对象作为输入的功能是在该版本中首次引入的。如果您需要支持早期版本的 Python，则可以将 listdir(src) 替换为 listdir(str(src))。

- Boris Dalstein

由于我无法编辑，我修改了@Boris-Dalstein代码的版本并将其添加为答案。 - Musa Biralo

0

这里有一个受到本主题启发的版本，更接近于 distutils.file_util.copy_file。

updateonly 是一个布尔值，如果为 True，则只会复制修改日期比 dst 中现有文件更新的文件，除非列在 forceupdate 中，否则将无条件复制。

ignore 和 forceupdate 期望相对于 src 的文件名或文件夹/文件名列表，并接受类似于 glob 或 fnmatch 的 Unix 风格通配符。

该函数返回一个已复制文件的列表（如果 dryrun 为 True，则将复制）。

import os
import shutil
import fnmatch
import stat
import itertools

def copyToDir(src, dst, updateonly=True, symlinks=True, ignore=None, forceupdate=None, dryrun=False):

    def copySymLink(srclink, destlink):
        if os.path.lexists(destlink):
            os.remove(destlink)
        os.symlink(os.readlink(srclink), destlink)
        try:
            st = os.lstat(srclink)
            mode = stat.S_IMODE(st.st_mode)
            os.lchmod(destlink, mode)
        except OSError:
            pass  # lchmod not available
    fc = []
    if not os.path.exists(dst) and not dryrun:
        os.makedirs(dst)
        shutil.copystat(src, dst)
    if ignore is not None:
        ignorepatterns = [os.path.join(src, *x.split('/')) for x in ignore]
    else:
        ignorepatterns = []
    if forceupdate is not None:
        forceupdatepatterns = [os.path.join(src, *x.split('/')) for x in forceupdate]
    else:
        forceupdatepatterns = []
    srclen = len(src)
    for root, dirs, files in os.walk(src):
        fullsrcfiles = [os.path.join(root, x) for x in files]
        t = root[srclen+1:]
        dstroot = os.path.join(dst, t)
        fulldstfiles = [os.path.join(dstroot, x) for x in files]
        excludefiles = list(itertools.chain.from_iterable([fnmatch.filter(fullsrcfiles, pattern) for pattern in ignorepatterns]))
        forceupdatefiles = list(itertools.chain.from_iterable([fnmatch.filter(fullsrcfiles, pattern) for pattern in forceupdatepatterns]))
        for directory in dirs:
            fullsrcdir = os.path.join(src, directory)
            fulldstdir = os.path.join(dstroot, directory)
            if os.path.islink(fullsrcdir):
                if symlinks and dryrun is False:
                    copySymLink(fullsrcdir, fulldstdir)
            else:
                if not os.path.exists(directory) and dryrun is False:
                    os.makedirs(os.path.join(dst, dir))
                    shutil.copystat(src, dst)
        for s,d in zip(fullsrcfiles, fulldstfiles):
            if s not in excludefiles:
                if updateonly:
                    go = False
                    if os.path.isfile(d):
                        srcdate = os.stat(s).st_mtime
                        dstdate = os.stat(d).st_mtime
                        if srcdate > dstdate:
                            go = True
                    else:
                        go = True
                    if s in forceupdatefiles:
                        go = True
                    if go is True:
                        fc.append(d)
                        if not dryrun:
                            if os.path.islink(s) and symlinks is True:
                                copySymLink(s, d)
                            else:
                                shutil.copy2(s, d)
                else:
                    fc.append(d)
                    if not dryrun:
                        if os.path.islink(s) and symlinks is True:
                            copySymLink(s, d)
                        else:
                            shutil.copy2(s, d)
    return fc

- KenV99

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Brendan Abel · Accepted Answer

446

这里有一个标准库中的解决方案：

from distutils.dir_util import copy_tree
copy_tree("/a/b/c", "/x/y/z")

请看这个类似的问题。

使用Python将目录内容复制到另一个目录

参考资料- https://docs.python.org/3/distutils/apiref.html#distutils.dir_util.copy_tree

- Brendan Abel

8

这个很好，因为它使用了标准库。符号链接、模式和时间也可以被保留。 - itsafire

6

注意到一个小缺点。distutils.errors.DistutilsInternalError: mkpath: 'name' must be a string，即不接受PosixPath类型。需要使用str(PosixPath)转换为字符串。关于改进的愿望清单，除了这个问题，我很喜欢这个答案。 - Sun Bear

1

@SunBear，是的，我认为大多数采用字符串路径的其他库也会遇到这种情况。我想这部分是因为选择不让Path对象继承自str，就像以前实现面向对象路径对象的大多数方式一样，这是一个缺点。 - Brendan Abel

7

请注意，虽然 distutils.dir_util.copy_tree() 技术上是公共的，但是 distutils 的开发人员已经明确表示（与 @SunBear 相同链接，谢谢！）该函数被视为 distutils 的实现细节，不建议公开使用。真正的解决方案应该是改进/扩展 shutil.copytree()，使其行为更像 distutils.dir_util.copy_tree()，但没有其缺点。在此期间，我将继续使用类似其他答案提供的自定义辅助函数。 - Boris Dalstein

31

Python 3.10 中已经弃用了 distutils，并计划在 3.12 版本中删除，详见 PEP 632。 - Kevin

显示剩余3条评论