一个相当简单的解决方案是运行一些子进程调用,将文件导出为CSV格式:
import subprocess
location = '.'
pattern = '*.py'
rootDir = location.rpartition('/')[-1]
outputFile = rootDir + '_directory_contents.csv'
find_cmd = 'find ' + location + ' -name ' + pattern + ' -fprintf ' + outputFile + ' "%Y%M,%n,%u,%g,%s,%A+,%P\n"'
subprocess.call(find_cmd, shell=True)
该命令会生成逗号分隔的值,可在Excel中轻松分析。
f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py
生成的 CSV 文件没有标题行,但您可以使用第二个命令添加它们。
# Add headers to the CSV
headers_cmd = 'sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath" ' + outputFile
subprocess.call(headers_cmd, shell=True)
根据你接收到的数据量,你可以使用Pandas进一步处理它。以下是我发现的一些有用的东西,特别是如果你正在处理许多级别的目录来查找。
将这些添加到你的导入中:
import numpy as np
import pandas as pd
然后将此添加到您的代码中:
df = pd.read_csv(outputFile)
df['FileName'] = df['FilePath'].str.rsplit("/", 1).str[-1]
df['FileExt'] = df['FileName'].str.rsplit('.', 1).str[1]
df['FullPath'] = df["FilePath"].str.rsplit("/", 1).str[0]
df['FullPath'] = np.where(df['FullPath'].str.contains("/"), df['FullPath'], rootDir)
df['ParentDir'] = df['FullPath'].str.split("/", 1).str[0]
df['SubDirs'] = df['FullPath'].str.split("/", 1).str[1]
df['SubDirs'] = np.where(df.SubDirs.str.contains('NaN'), '', df.SubDirs)
df['Type'] = np.where(df['Permissions'].str.startswith('d'), 'Dir', 'File')
df[['ModifiedDate', 'Time']] = df.ModifiedTime.str.rsplit('+', 1, expand=True)
df['Time'] = df['Time'].str.split('.').str[0]
df = df[df['Type'].str.contains('File')]
df = df[['FileName', 'ParentDir', 'SubDirs', 'FullPath', 'DocType', 'ModifiedDate', 'Time', 'Size']]
filesize = []
for items in df['Size'].items():
filesize.append(convert_bytes(items[1]))
df['Size'] = filesize
with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
for directory, data in df.groupby('ParentDir'):
data.to_excel(writer, sheet_name = directory, index=False)
def convert_bytes(size):
for x in ['b', 'K', 'M', 'G', 'T']:
if size < 1024:
return "%3.1f %s" % (size, x)
size /= 1024
return size