在Python中浏览文件和子文件夹

Question

在Python中浏览文件和子文件夹

74

我希望能够浏览当前文件夹及其所有子文件夹，并获取所有扩展名为.htm|.html的文件。我已经发现可以通过以下方式判断一个对象是目录还是文件：

import os

dirList = os.listdir("./") # current directory
for dir in dirList:
  if os.path.isdir(dir) == True:
    # I don't know how to get into this dir and do the same thing here
  else:
    # I got file and i can regexp if it is .htm|html

最终，我希望将所有文件及其路径存储在一个数组中。这种操作是否可行？

- Blackie123

可能是重复的问题：如何遍历目录中的文件？ - S.Lott

7个回答

18

我有一个类似的任务需要处理，这是我做的方式。

import os

rootdir = os.getcwd()

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        #print os.path.join(subdir, file)
        filepath = subdir + os.sep + file

        if filepath.endswith(".html"):
            print (filepath)

希望这有所帮助。

- Pragyaditya Das

1

@Pragyaditya_Das，太棒了！ - Mark K

8

在Python 3中，您可以使用os.scandir()：

def dir_scan(path):
    for i in os.scandir(path):
        if i.is_file():
            print('File: ' + i.path)
        elif i.is_dir():
            print('Folder: ' + i.path)
            dir_scan(i.path)

- Spas

1

这个答案并不太合适，因为os.scandir()并没有像问题中要求的那样遍历所有子文件夹。即使在Python 3中，os.walk()更好，就像接受的答案一样。 - Dan Stowell

这个答案不太合适，因为os.scandir()并没有遍历所有子文件夹（正如问题中所要求的）。即使在Python 3中，os.walk()也更好，就像被接受的答案一样。 - Dan Stowell

@DanStowell，你说得对，它没有遍历子文件夹。我改了我的答案，所以它会遍历每个子文件夹中的每个文件。os.scandir()应该比os.walk()更快-https://peps.python.org/pep-0471/ - Spas

5

使用 newDirName = os.path.abspath(dir) 来创建子目录的完整路径名称，然后像处理父文件夹一样列出其内容（例如，newDirList = os.listDir(newDirName)）。

您可以创建代码片段的单独方法，并递归调用它通过子目录结构。第一个参数是目录路径名。这会针对每个子目录进行更改。

此答案基于Python库3.1.1版本的文档。在Python 3.1.1 Library Reference（第10章-文件和目录访问）的第228页中有一个很好的模型示例。祝你好运！

- NeonJack

0

对我来说，有两种方法可行。

1. Work with the `os` package and use `'__file__'` to replace the main 
directory when the project locates

import os
script_dir = os.path.dirname(__file__)      

path = 'subdirectory/test.txt'
file = os.path.join(script_dir, path)
fileread = open(file,'r') 


2. By using '\\' to read or write the file in subfolder 
fileread = open('subdirectory\\test.txt','r')

- Yi2021

请勿将相同的答案粘贴到多个问题中。这已被标记给管理员。 - Trenton McKinney

0

稍微修改了Sven Marnach的解决方案。


import os

文件夹位置 = 'C:\SomeFolderName' 文件列表 = create_file_list(文件夹位置)

def create_file_list(path): 返回列表 = []

for 文件名 in os.walk(path): for 文件列表中的文件 in 文件名: for 文件名 in 文件列表中的文件: if 文件名.endswith((".txt")): 返回列表.append(文件名)

return 返回列表

- campervancoder

由于某些原因，上面的粘贴中存在额外的空格，并且for块缩进不正确。SO的标记不喜欢我。 - campervancoder

3

简单代码的重新修改很糟糕 - 将元组赋值替换为嵌套循环会使代码变得更难以阅读，而且可能也不如原来的效率高。 - volcano

感谢您的评论@volcano。上面的示例似乎无法正常工作，因此需要额外的for循环。 - campervancoder

-1

from tkinter import *
import os

root = Tk()
file = filedialog.askdirectory()
changed_dir = os.listdir(file)
print(changed_dir)
root.mainloop()

- Akshat Mishra

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sven Marnach · Accepted Answer

您可以使用 os.walk() 来递归迭代目录及其所有子目录：

for root, dirs, files in os.walk(path):
    for name in files:
        if name.endswith((".html", ".htm")):
            # whatever

要构建这些名称的列表，您可以使用列表推导式：

htmlfiles = [os.path.join(root, name)
             for root, dirs, files in os.walk(path)
             for name in files
             if name.endswith((".html", ".htm"))]