Question

252

我正在尝试编写一个脚本来列出给定目录中的所有目录、子目录和文件。

我尝试了这个：

import sys, os

root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")

for r, d, f in os.walk(path):
    for file in f:
        print(os.path.join(root, file))

很遗憾，它不能正常工作。我可以获取所有文件，但无法获取它们的完整路径。

例如，如果目录结构如下：

/home/patate/directory/targetdirectory/123/456/789/file.txt

它会打印：

/home/patate/directory/targetdirectory/file.txt

我需要第一个结果。

- thomytheyon

12个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- kiran beethoju · Answer 1

这只是一个附加功能。有了它，您可以将数据以CSV格式获取：

import sys, os

try:
    import pandas as pd
except:
    os.system("pip3 install pandas")

root = "/home/kiran/Downloads/MainFolder" # It may have many subfolders and files inside
lst = []
from fnmatch import fnmatch
pattern = "*.csv"      # I want to get only csv files
pattern = "*.*"        # Note: Use this pattern to get all types of files and folders
for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            lst.append((os.path.join(path, name)))
df = pd.DataFrame({"filePaths":lst})
df.to_csv("filepaths.csv")

- CMSG · Answer 2

一个相当简单的解决方案是运行一些子进程调用，将文件导出为CSV格式：

import subprocess

# Global variables for directory being mapped

location = '.' # Enter the path here.
pattern = '*.py' # Use this if you want to only return certain filetypes
rootDir = location.rpartition('/')[-1]
outputFile = rootDir + '_directory_contents.csv'

# Find the requested data and export to CSV, specifying a pattern if needed.
find_cmd = 'find ' + location + ' -name ' + pattern +  ' -fprintf ' + outputFile + '  "%Y%M,%n,%u,%g,%s,%A+,%P\n"'
subprocess.call(find_cmd, shell=True)

该命令会生成逗号分隔的值，可在Excel中轻松分析。

f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py

生成的 CSV 文件没有标题行，但您可以使用第二个命令添加它们。

# Add headers to the CSV
headers_cmd = 'sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath" ' + outputFile
subprocess.call(headers_cmd, shell=True)

根据你接收到的数据量，你可以使用Pandas进一步处理它。以下是我发现的一些有用的东西，特别是如果你正在处理许多级别的目录来查找。

将这些添加到你的导入中：

import numpy as np
import pandas as pd

然后将此添加到您的代码中：

# Create DataFrame from the CSV file created above.
df = pd.read_csv(outputFile)

# Format columns
# Get the filename and file extension from the filepath
df['FileName'] = df['FilePath'].str.rsplit("/", 1).str[-1]
df['FileExt'] = df['FileName'].str.rsplit('.', 1).str[1]

# Get the full path to the files. If the path doesn't include a "/" it's the root directory
df['FullPath'] = df["FilePath"].str.rsplit("/", 1).str[0]
df['FullPath'] = np.where(df['FullPath'].str.contains("/"), df['FullPath'], rootDir)

# Split the path into columns for the parent directory and its children
df['ParentDir'] = df['FullPath'].str.split("/", 1).str[0]
df['SubDirs'] = df['FullPath'].str.split("/", 1).str[1]
# Account for NaN returns, indicates the path is the root directory
df['SubDirs'] = np.where(df.SubDirs.str.contains('NaN'), '', df.SubDirs)

# Determine if the item is a directory or file.
df['Type'] = np.where(df['Permissions'].str.startswith('d'), 'Dir', 'File')

# Split the time stamp into date and time columns
df[['ModifiedDate', 'Time']] = df.ModifiedTime.str.rsplit('+', 1, expand=True)
df['Time'] = df['Time'].str.split('.').str[0]

# Show only files, output includes paths so you don't necessarily need to display the individual directories.
df = df[df['Type'].str.contains('File')]

# Set columns to show and their order.
df = df[['FileName', 'ParentDir', 'SubDirs', 'FullPath', 'DocType', 'ModifiedDate', 'Time', 'Size']]

filesize = [] # Create an empty list to store file sizes to convert them to something more readable.

# Go through the items and convert the filesize from bytes to something more readable.
for items in df['Size'].items():
    filesize.append(convert_bytes(items[1]))
    df['Size'] = filesize

# Send the data to an Excel workbook with sheets by parent directory
with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
    for directory, data in df.groupby('ParentDir'):
    data.to_excel(writer, sheet_name = directory, index=False)


# To convert sizes to be more human readable
def convert_bytes(size):
    for x in ['b', 'K', 'M', 'G', 'T']:
        if size < 1024:
            return "%3.1f %s" % (size, x)
        size /= 1024

    return size

Python列出目录、子目录和文件。