如何在当前目录中运行所有*.txt文件的脚本？

Question

如何在当前目录中运行所有*.txt文件的脚本？

4

我正在尝试在当前目录下运行以下脚本，以处理所有*.txt文件。目前它只会处理test.txt文件，并根据正则表达式打印文本块。最快的方式是什么，可以扫描当前目录中的所有*.txt文件并在所有找到的*.txt文件上运行以下脚本？另外，我如何包含包含'word1'和'word3'的行，因为当前脚本仅打印这两行之间的内容？我想打印整个块。

#!/usr/bin/env python
import os, re
file = 'test.txt'
with open(file) as fp:
   for result in re.findall('word1(.*?)word3', fp.read(), re.S):
     print result

我希望能得到关于如何改进上述代码的任何建议或建议，例如在运行大量文本文件时提高速度。谢谢。

- user3066287

3

非常相关：使用Python查找目录中扩展名为.txt的所有文件。 - apsillers

@apsillers，感谢您的建议，我看到了这个，但不确定哪种解决方案是最优的...？ - user3066287

2个回答

0

受到falsetru的答案启发，我重写了代码，使其更加通用。

现在要探索的文件：

可以通过字符串来描述，作为glob()使用的第二个参数，
或者通过专门为此目的编写的函数来描述，以防所需文件集无法用globish模式描述
如果没有传递第三个参数，则可能在当前目录中，
或者在指定目录中，如果其路径作为第二个参数传递

.

import re,glob
from itertools import ifilter
from os import getcwd,listdir,path
from inspect import isfunction

regx = re.compile('^[^\n]*word1.*?word3.*?$',re.S|re.M)

G = '\n\n'\
    'MWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMW\n'\
    'MWMWMW  %s\n'\
    'MWMWMW  %s\n'\
    '%s%s'

def search(REGX, how_to_find_files, dirpath='',
           G=G,sepm = '\n======================\n'):
    if dirpath=='':
        dirpath = getcwd()

    if isfunction(how_to_find_files):
        gen = ifilter(how_to_find_files,
                      ifilter(path.isfile,listdir(dirpath)))
    elif isinstance(how_to_find_files,str):
        gen = glob.glob(path.join(dirpath,
                                  how_to_find_files))

    for fn in gen:
        with open(fn) as fp:
            found = REGX.findall(fp.read())
            if found:
                yield G % (dirpath,path.basename(fn),
                           sepm,sepm.join(found))

# Example of searching in .txt files

#============ one use ===================
def select(fn):
    return fn[-4:]=='.txt'
print ''.join(search(regx, select))

#============= another use ==============
print ''.join(search(regx,'*.txt'))

通过一系列生成器的连续处理链接多个文件的优点在于，最终使用 ''.join() 进行连接会创建一个唯一的字符串，可以立即写入，而如果没有这样处理，则依次打印多个单独的字符串需要更长的时间，因为需要中断显示（我表达清楚了吗？）

- eyquem

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- falsetru · Accepted Answer

6

使用 glob.glob:

import os, re
import glob

pattern = re.compile('word1(.*?)word3', flags=re.S)
for file in glob.glob('*.txt'):
    with open(file) as fp:
        for result in pattern.findall(fp.read()):
            print result

- falsetru

1

@user3066287，两个版本几乎相同。 - falsetru

@user3066287，glob.glob('*.txt') 只能在当前目录中查找 txt 文件，而你评论的 os.walk 版本可以递归地在子目录中查找。 - falsetru

谢谢您的回答，您对我问题的第二部分有什么建议吗？ - user3066287

@user3066287，使用re.compile编译正则表达式会略微提高速度，但不会有太大的改善。我更新了答案以使用re.compile。 - falsetru

谢谢，拥有大量的文本文件将会产生很大的影响，我说得对吗？ - user3066287

显示剩余3条评论