这里还有另一种方法。
import os
import re
import pandas as pd
def count_files(top, pattern, list_files):
top = os.path.abspath(os.path.expanduser(top))
res = []
for root, dirs, files in os.walk(top):
name_space = os.path.relpath(root, top)
level = os.path.normpath(name_space).count(os.sep) + 1 if name_space != '.' else 0
matches = [file for file in files if re.search(pattern, file)]
if matches:
if list_files:
res.append((pattern, level, name_space, len(matches), matches))
else:
res.append((pattern, level, name_space, len(matches)))
if list_files:
df = pd.DataFrame(res, columns=['pattern', 'level', 'name_space', 'count', 'files'])
else:
df = pd.DataFrame(res, columns=['pattern', 'level', 'name_space', 'count'])
return df
考虑以下目录结构。
rajulocal@hogwarts ~/x/x5 % tree -a
.
├── analysis.txt
├── count_files.ipynb
├── d1
│ ├── d2
│ │ ├── d3
│ │ │ └── f5.txt
│ │ ├── f3.txt
│ │ └── f4.txt
│ ├── f2.txt
│ └── f6.txt
├── f1.txt
├── f7.txt
└── .ipynb_checkpoints
└── count_files-checkpoint.ipynb
4 directories, 10 files
统计每个目录中的文本文件数量(即以 .txt 结尾的文件)
rajulocal@hogwarts ~/x/x5 % ipython
Python 3.10.6 (main, Oct 24 2022, 16:07:47) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.6.0 -- An enhanced Interactive Python. Type '?' for help.
...
In [2]:
df = count_files("~/x/x5", "\.txt", False)
df
Out[2]:
pattern level name_space count
0 \.txt 0 . 3
1 \.txt 1 d1 2
2 \.txt 2 d1/d2 2
3 \.txt 3 d1/d2/d3 1
查看这些文件是什么
In [3]:
df = count_files("~/x/x5", "\.txt", True)
df
Out[3]:
pattern level name_space count files
0 \.txt 0 . 3 [analysis.txt, f1.txt, f7.txt]
1 \.txt 1 d1 2 [f6.txt, f2.txt]
2 \.txt 2 d1/d2 2 [f4.txt, f3.txt]
3 \.txt 3 d1/d2/d3 1 [f5.txt]
获取文件总数
In [4]:
df['count'].sum()
Out[4]:
8
计算以 .ipynb 结尾的文件数量(ipython 笔记本文件)
In [5]:
df = count_files("~/x/x5", "\.ipynb", True)
df
Out[5]:
pattern level name_space count files
0 \.ipynb 0 . 1 [count_files.ipynb]
1 \.ipynb 1 .ipynb_checkpoints 1 [count_files-checkpoint.ipynb]
In [6]:
df['count'].sum()
Out[6]:
2
统计所有文件
In [7]:
df = count_files("~/x/x5", ".*", False)
df
Out[7]:
pattern level name_space count
0 .* 0 . 4
1 .* 1 .ipynb_checkpoints 1
2 .* 1 d1 2
3 .* 2 d1/d2 2
4 .* 3 d1/d2/d3 1
In [8]:
df['count'].sum()
Out[8]:
10
这与tree命令返回的文件数量相匹配。
os.walk
。 - Scott Hunter