在Python中查找文件

Question

在Python中查找文件

python

164

我有一个文件，可能在每个用户的机器上都处于不同的位置。有没有一种方法可以实现对该文件的搜索？我能否传递文件名和要搜索的目录树以进行搜索？

- directedition

2

请参阅os模块中的os.walk或os.listdir。还可以查看此问题https://dev59.com/73VC5IYBdhLWcg3woStW以获取示例代码。 - Martin Beckett

9个回答

49

在 Python 3.4 或更新版本中，您可以使用 pathlib 执行递归通配符匹配：

>>> import pathlib
>>> sorted(pathlib.Path('.').glob('**/*.py'))
[PosixPath('build/lib/pathlib.py'),
 PosixPath('docs/conf.py'),
 PosixPath('pathlib.py'),
 PosixPath('setup.py'),
 PosixPath('test_pathlib.py')]

参考: https://docs.python.org/zh-cn/3/library/pathlib.html#pathlib.Path.glob

在Python 3.5或更新版本中，您也可以像这样进行递归式全局搜索:

>>> import glob
>>> glob.glob('**/*.txt', recursive=True)
['2.txt', 'sub/3.txt']

参考资料：https://docs.python.org/3/library/glob.html#glob.glob

- Kenyon

全局变量运作良好。如果我们在当前目录中搜索，就不需要将目录列为参数。 - Liker777

优秀的回答。也可以使用相对路径进行全局匹配，例如：glob.glob('**/path/to/data/*.txt', recursive=True) -> ['/pwd/path/to/data/2.txt', '/pwd/path/to/data/sub/3.txt'] 默认情况下，glob.glob()使用os.getcwd()作为根目录。Python的后续版本允许覆盖根目录。否则，请使用os.chdir()设置PWD，然后进行全局匹配！ - kevinarpe

27

我使用了os.walk的一个版本，在一个更大的目录上花费了约3.5秒的时间。我尝试了两种随机解决方案，但没有得到很大的改善，后来只是做了这个：

paths = [line[2:] for line in subprocess.check_output("find . -iname '*.txt'", shell=True).splitlines()]

虽然这只适用于POSIX，但我成功做到了0.25秒。

由此，我相信完全有可能以跨平台的方式对整个搜索进行优化，但这就是我的研究停止的地方。

- kgadek

8

如果你在Ubuntu上使用Python，并且只想让它在Ubuntu上运行，那么一个更快的方法是使用终端的locate程序，就像这样。

import subprocess

def find_files(file_name):
    command = ['locate', file_name]

    output = subprocess.Popen(command, stdout=subprocess.PIPE).communicate()[0]
    output = output.decode()

    search_results = output.split('\n')

    return search_results

search_results 是一个包含绝对文件路径的列表。相比之前提到的方法，这种方法快了数万倍。我曾使用它进行一次搜索，速度比之前的方法快了大约72,000倍。

- SARose

4

如果你正在使用Python 2，你会遇到在Windows上由自引用符号链接引起的无限递归问题。

这个脚本将避免跟随这些符号链接。请注意，这是仅适用于Windows的！

import os
from scandir import scandir
import ctypes

def is_sym_link(path):
    # https://dev59.com/WWUp5IYBdhLWcg3wdnd6#35915819
    FILE_ATTRIBUTE_REPARSE_POINT = 0x0400
    return os.path.isdir(path) and (ctypes.windll.kernel32.GetFileAttributesW(unicode(path)) & FILE_ATTRIBUTE_REPARSE_POINT)

def find(base, filenames):
    hits = []

    def find_in_dir_subdir(direc):
        content = scandir(direc)
        for entry in content:
            if entry.name in filenames:
                hits.append(os.path.join(direc, entry.name))

            elif entry.is_dir() and not is_sym_link(os.path.join(direc, entry.name)):
                try:
                    find_in_dir_subdir(os.path.join(direc, entry.name))
                except UnicodeDecodeError:
                    print "Could not resolve " + os.path.join(direc, entry.name)
                    continue

    if not os.path.exists(base):
        return
    else:
        find_in_dir_subdir(base)

    return hits

该函数返回一个列表，其中包含指向filenames列表中文件的所有路径。

用法：

find("C:\\", ["file1.abc", "file2.abc", "file3.abc", "file4.abc", "file5.abc"])

- F.M.F.

2

以下我们使用一个布尔类型的“first”参数，在第一次匹配和所有匹配之间进行切换（默认值相当于“find . -name file”）：

import  os

def find(root, file, first=False):
    for d, subD, f in os.walk(root):
        if file in f:
            print("{0} : {1}".format(file, d))
            if first == True:
                break

- Leon Chang

2

答案与现有答案非常相似，但稍微进行了优化。

因此，您可以通过模式查找任何文件或文件夹：

def iter_all(pattern, path):
    return (
        os.path.join(root, entry)
        for root, dirs, files in os.walk(path)
        for entry in dirs + files
        if pattern.match(entry)
    )

通过子字符串：

def iter_all(substring, path):
    return (
        os.path.join(root, entry)
        for root, dirs, files in os.walk(path)
        for entry in dirs + files
        if substring in entry
    )

或使用谓词：

def iter_all(predicate, path):
    return (
        os.path.join(root, entry)
        for root, dirs, files in os.walk(path)
        for entry in dirs + files
        if predicate(entry)
    )

如果只想搜索文件或者只想搜索文件夹 - 例如，将“dirs + files”替换为仅“dirs”或仅“files”，具体取决于您的需求。

祝好。

- Stanislav Kuzmich

1

@F.M.F的答案在此版本中存在一些问题，因此我进行了一些调整以使其正常工作。

import os
from os import scandir
import ctypes

def is_sym_link(path):
    # https://dev59.com/WWUp5IYBdhLWcg3wdnd6#35915819
    FILE_ATTRIBUTE_REPARSE_POINT = 0x0400
    return os.path.isdir(path) and (ctypes.windll.kernel32.GetFileAttributesW(str(path)) & FILE_ATTRIBUTE_REPARSE_POINT)

def find(base, filenames):
    hits = []

    def find_in_dir_subdir(direc):
        content = scandir(direc)
        for entry in content:
            if entry.name in filenames:
                hits.append(os.path.join(direc, entry.name))

            elif entry.is_dir() and not is_sym_link(os.path.join(direc, entry.name)):
                try:
                    find_in_dir_subdir(os.path.join(direc, entry.name))
                except UnicodeDecodeError:
                    print("Could not resolve " + os.path.join(direc, entry.name))
                    continue
                except PermissionError:
                    print("Skipped " + os.path.join(direc, entry.name) + ". I lacked permission to navigate")
                    continue

    if not os.path.exists(base):
        return
    else:
        find_in_dir_subdir(base)

    return hits

在Python 3中，unicode()被更改为str()，因此我进行了调整（第8行）。

我还添加了对PermissionError的异常处理（第25行）。这样，如果程序找到无法访问的文件夹，它也不会停止运行。

最后，我想提醒一下。当运行程序时，即使你只是要查找单个文件/目录，也要将其作为列表传递。否则，你会得到很多与你的搜索不匹配的答案。

使用示例：

find("C:\", ["Python", "Homework"])

或者

find("C:\\", ["Homework"])

但是，例如：find("C:\\", "Homework")将给出不必要的答案。

如果说我知道为什么会发生这种情况，那我就是在撒谎。再次声明，这不是我的代码，我只是进行了必要的调整以使其正常运行。所有功劳应归于@F.M.F。

- Felipe Soriano

1

SARose的答案对我有用，但在我更新到Ubuntu 20.04 LTS之后失效了。我对他的代码进行了轻微修改，使其可以在最新的Ubuntu版本上运行。

import subprocess

def find_files(file_name):
    command = ['locate'+ ' ' + file_name]
    output = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True).communicate()[0]
    output = output.decode()
    search_results = output.split('\n')
    return search_results

- Justin Turner

Python本身能够找到文件，而无需使用子进程来执行Unix命令。 - OneCricketeer

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Nadia Alramli · Accepted Answer

os.walk 是答案，它将找到第一个匹配项：

import os

def find(name, path):
    for root, dirs, files in os.walk(path):
        if name in files:
            return os.path.join(root, name)

这将会找到所有匹配项：

def find_all(name, path):
    result = []
    for root, dirs, files in os.walk(path):
        if name in files:
            result.append(os.path.join(root, name))
    return result

而这将匹配一个模式：

import os, fnmatch
def find(pattern, path):
    result = []
    for root, dirs, files in os.walk(path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                result.append(os.path.join(root, name))
    return result

find('*.txt', '/path/to/dir')