Question

252

我正在尝试编写一个脚本来列出给定目录中的所有目录、子目录和文件。

我尝试了这个：

import sys, os

root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")

for r, d, f in os.walk(path):
    for file in f:
        print(os.path.join(root, file))

很遗憾，它不能正常工作。我可以获取所有文件，但无法获取它们的完整路径。

例如，如果目录结构如下：

/home/patate/directory/targetdirectory/123/456/789/file.txt

它会打印：

/home/patate/directory/targetdirectory/file.txt

我需要第一个结果。

- thomytheyon

12个回答

80

以防万一...获取目录及其子目录中与某个模式匹配的所有文件(*.py 例如):

import os
from fnmatch import fnmatch

root = '/some/directory'
pattern = "*.py"

for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            print(os.path.join(path, name))

- Ivan Pirog

在Python3中，使用括号来调用print函数，例如：print(os.path.join(path, name))。你也可以使用print(pathlib.PurePath(path, name))。 - Ahmad Ismail

2

同样的检查可以使用简单的字符串.endswith()方法完成 ;)fnmatch使用Unix shell通配符：https://docs.python.org/3/library/fnmatch.html - ash17

1

如果你只需要检查文件扩展名，个人建议使用 if name.endswith(".py"): 而不是导入一个模块。 - L0Lock

34

无法评论，所以在这里写答案。这是我看过的最清晰的一行文字：

import os
[os.path.join(path, name) for path, subdirs, files in os.walk(root) for name in files]

- Mong H. Ng

5

这是对于所有谷歌搜索者的答案。 - Matt

确实，那正是我在寻找的东西 :-) - Antoine

13

这是一个一句话的内容：

import os

[val for sublist in [[os.path.join(i[0], j) for j in i[2]] for i in os.walk('./')] for val in sublist]
# Meta comment to ease selecting text

最外层的val for sublist in ...循环将列表展平为一维。 j循环收集每个文件的基本名称，并将其连接到当前路径。最后，i循环遍历所有目录和子目录。

此示例在os.walk(...)调用中使用硬编码路径./，您可以补充任何您喜欢的路径字符串。 注意：可以使用os.path.expanduser和/或os.path.expandvars来处理像~/这样的路径字符串

扩展此示例：

很容易添加文件基本名称测试和目录名称测试。

例如，测试*.jpg文件：

... for j in i[2] if j.endswith('.jpg')] ...

此外，不包括.git目录：

... for i in os.walk('./') if '.git' not in i[0].split('/')]

- ThorSummoner

它确实可以工作，但要排除.git目录，您需要检查路径中是否没有'.git'。 - Roman Rdgz

没错。应该是如果 i[0].split('/') 中不包含 '.git'。 - Roman Rdgz

我建议使用os.walk而不是手动遍历目录循环，生成器非常好用，去试试吧。 - ThorSummoner

12

另一个选择是使用标准库中的glob模块：

import glob

path = "/home/patate/directory/targetdirectory/**"

for path in glob.glob(path, recursive=True):
    print(path)

如果您需要一个迭代器，可以使用 iglob 作为替代：

for file in glob.iglob(my_path, recursive=True):
    # ...

- Rotareti

5

一个更简单的一行代码：

import os
from itertools import product, chain

chain.from_iterable([[os.sep.join(w) for w in product([i[0]], i[2])] for i in os.walk(dir)])

- Daniel

如何列出每个文件？ - Aakash Gupta

4

你可以看一下我做的这个示例。它使用了已经被弃用的os.path.walk函数，需要注意。它使用一个列表来存储所有文件路径。

root = "Your root directory"
ex = ".txt"
where_to = "Wherever you wanna write your file to"

def fileWalker(ext, dirname, names):
    '''
    checks files in names'''
    pat = "*" + ext[0]
    for f in names:
        if fnmatch.fnmatch(f, pat):
            ext[1].append(os.path.join(dirname, f))


def writeTo(fList):

    with open(where_to, "w") as f:
        for di_r in fList:
            f.write(di_r + "\n")


if __name__ == '__main__':
    li = []
    os.path.walk(root, fileWalker, [ex, li])

    writeTo(li)

- devsaw

4

由于这里的每个示例都只是使用walk（带有join），我想展示一个很好的示例，并与listdir进行比较：

import os, time

def listFiles1(root): # listdir
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0)+"/"; items = os.listdir(folder) # items = folders + files
        for i in items: i=folder+i; (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles2(root): # listdir/join (takes ~1.4x as long) (and uses '\\' instead)
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0); items = os.listdir(folder) # items = folders + files
        for i in items: i=os.path.join(folder,i); (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles3(root): # walk (takes ~1.5x as long)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[folder.replace("\\","/")+"/"+file] # folder+"\\"+file still ~1.5x
    return allFiles

def listFiles4(root): # walk/join (takes ~1.6x as long) (and uses '\\' instead)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[os.path.join(folder,file)]
    return allFiles


for i in range(100): files = listFiles1("src") # warm up

start = time.time()
for i in range(100): files = listFiles1("src") # listdir
print("Time taken: %.2fs"%(time.time()-start)) # 0.28s

start = time.time()
for i in range(100): files = listFiles2("src") # listdir and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.38s

start = time.time()
for i in range(100): files = listFiles3("src") # walk
print("Time taken: %.2fs"%(time.time()-start)) # 0.42s

start = time.time()
for i in range(100): files = listFiles4("src") # walk and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.47s

正如您自己所看到的，listdir版本更加高效。（而join则较慢）

- Puddle

3

使用任何支持的Python版本（3.4+），您应该使用pathlib.rglob来递归列出当前目录和所有子目录的内容：

from pathlib import Path


def generate_all_files(root: Path, only_files: bool = True):
    for p in root.rglob("*"):
        if only_files and not p.is_file():
            continue
        yield p


for p in generate_all_files(Path("."), only_files=False):
    print(p)

如果你想要一个可以复制粘贴的内容：

示例：

文件夹结构：

$ tree . -a
.
├── a.txt
├── bar
├── b.py
├── collect.py
├── empty
├── foo
│   └── bar.bz.gz2
├── .hidden
│   └── secrect-file
└── martin
    └── thoma
        └── cv.pdf

给：

$ python collect.py
bar
empty
.hidden
collect.py
a.txt
b.py
martin
foo
.hidden/secrect-file
martin/thoma
martin/thoma/cv.pdf
foo/bar.bz.gz2

- Martin Thoma

这个在哪里测试过？例如，在后期版本的Ubuntu中，可执行文件的名称是python3。 - Peter Mortensen

我不知道你指的是什么。 - Martin Thoma

这个在哪个操作系统上测试过？例如，这在某些版本的Ubuntu上是不起作用的。 - Peter Mortensen

1

我使用Ubuntu 20.04和有时候Mac。你为什么认为这在Ubuntu上不会起作用呢？ - Martin Thoma

不，这与我无关。我只是个传令者。我的机器上有不同的行为，可能是因为不同的（安装）配置，但错误“Command 'python' not found”在Ubuntu世界中是众所周知的（这是Canonical（公司）的有意决定，而不是意外或错误）。我代表未来的读者抱怨你的回答（在当前状态下）的不准确性。我已经提供了证据，证明它在某些（默认）的Ubuntu安装中无法工作。你可以选择是否要纠正它。 - undefined

显示剩余4条评论

1

如果您想要在SharePoint上列出文件，以下是如何列出的方式。您的路径可能会从"\teams\"部分开始。

import os

root = r"\\mycompany.sharepoint.com@SSL\DavWWWRoot\teams\MyFolder\Policies and Procedures\Deal Docs\My Deals"
list = [os.path.join(path, name) for path, subdirs, files in os.walk(root) for name in files]
print(list)

- Chadee Fouad

SharePoint有什么特别之处？你能详细说明一下吗？ - Peter Mortensen

大多数公司使用SharePoint来存储文件。 - Chadee Fouad

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Eli Bendersky · Accepted Answer

使用os.path.join将目录和文件名称连接起来：

import os

for path, subdirs, files in os.walk(root):
    for name in files:
        print(os.path.join(path, name))

请注意在连接字符串时使用path而不是root，因为使用root是不正确的。该内容与编程有关。

在Python 3.4中，新增了pathlib模块，用于更轻松地进行路径操作。因此，等效于os.path.join的代码为：

pathlib.PurePath(path, name)

pathlib 的优点是您可以在路径上使用各种有用的方法。如果您使用具体的 Path 变量，还可以通过它们执行实际的操作系统调用，例如切换到目录、删除路径、打开其指向的文件等等。

Python列出目录、子目录和文件。

扩展此示例：