在Python中用新文本替换文件中的文本。

Question

在Python中用新文本替换文件中的文本。

34

我是Python的新手。我想通过Python打开一个文件，并将某些单词的每个实例替换为给定的替换词。例如，将每个单词“zero”替换为“0”，将“temp”替换为“bob”，并将“garbage”替换为“nothing”。

我最初尝试使用这个：

for line in fileinput.input(fin):
        fout.write(line.replace('zero', '0'))
        fout.write(line.replace('temp','bob'))
        fout.write(line.replace('garbage','nothing'))

但我认为这不是一个正确的做法。然后，我考虑使用if语句来检查行是否包含这些项目，如果是，则替换行包含的哪个项目，但根据我所知的Python，这也不是真正理想的解决方案。我想知道最好的方法是什么。提前感谢！

- shadonar

我会做更多的事情，但这会给我做这种事情提供最好的实践。 - shadonar

1

在您当前的方法中，每个输入行都会被写入输出三次。这是您想要做的吗？ - Junuxx

1

还有，你在'bob后面漏掉了一个撇号。 - Junuxx

感谢关于撇号的说明。@Junuxx，我并没有打算这样做（我太蠢了）。如前所述，我对Python还很陌生，根据我在其他语言中的编码经验来看，逐行阅读是标准的做法。在Python中是否也是如此，或者有更好的方法来搜索文件并用其他词替换那些特定的单词？ - shadonar

1

如何使用Python在文件中搜索和替换文本？ - jfs

7个回答

9

如果您的文件长度不是特别长，您可以使用以下代码片段来直接替换文本：

# Replace variables in file
with open('path/to/in-out-file', 'r+') as f:
    content = f.read()
    f.seek(0)
    f.truncate()
    f.write(content.replace('replace this', 'with this'))

- John Calcote

7

对于这种情况，我建议使用dict和re.sub。下面是一个示例：

import re
repldict = {'zero':'0', 'one':'1' ,'temp':'bob','garage':'nothing'}
def replfunc(match):
    return repldict[match.group(0)]

regex = re.compile('|'.join(re.escape(x) for x in repldict))
with open('file.txt') as fin, open('fout.txt','w') as fout:
    for line in fin:
        fout.write(regex.sub(replfunc,line))

这比使用 replace 函数略为优越，因为它对重叠匹配具有更强的鲁棒性。

- mgilson

如果 OP 想要进行绝对字符串替换，re 可能有些过头了... 或者我漏掉了什么？ - inspectorG4dget

3

如果有重叠的匹配，使用正则表达式是必要的。 (line.replace('bob','robert').replace('robert','foo'))将bob更改为foo，这可能不是期望的结果，但使用正则表达式可以避免这种情况。此外，由于所有内容在一次操作中完成，因此可能更有效率（对于小文件来说不太重要，但对于大文件很重要）。 - mgilson

5

基本的方法是

read()，
根据需要使用data = data.replace()，然后
write()。

您可以一次性读取整个数据或分成较小的部分进行读取和写入。这取决于预期的文件大小。

read()可以替换为对文件对象进行迭代。

- glglgl

3

更快的写法是...

finput = open('path/to/input/file').read()
out = open('path/to/input/file', 'w')
replacements = {'zero':'0', 'temp':'bob', 'garbage':'nothing'}
for i in replacements.keys():
    finput = finput.replace(i, replacements[i])
out.write(finput)
out.close

这样做可以减少其他答案建议的很多迭代，并且可以加快处理较长文件的速度。

- Matt Olan

1

但它会读取整个文件（实际上为每个替换复制一次）——这对于大文件来说是一个很大的缺点。 - mgilson

0

从标准输入读取，将以下代码写入'code.py'：

import sys

rep = {'zero':'0', 'temp':'bob', 'garbage':'nothing'}

for line in sys.stdin:
    for k, v in rep.iteritems():
        line = line.replace(k, v)
    print line

然后，使用重定向或管道（http://en.wikipedia.org/wiki/Redirection_(computing)）执行脚本。

python code.py < infile > outfile

- satomacoto

-1

这是我刚刚使用的一个简短而简单的例子：

如果：

fp = open("file.txt", "w")

然后：

fp.write(line.replace('is', 'now'))
// "This is me" becomes "This now me"

注意：

line.replace('is', 'now')
fp.write(line)
// "This is me" not changed while writing

- AmazingDayToday

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- inspectorG4dget · Accepted Answer

这应该可以解决问题。

replacements = {'zero':'0', 'temp':'bob', 'garbage':'nothing'}

with open('path/to/input/file') as infile, open('path/to/output/file', 'w') as outfile:
    for line in infile:
        for src, target in replacements.items():
            line = line.replace(src, target)
        outfile.write(line)

编辑: 为了回应Eildosa的评论，如果您想在不写入另一个文件的情况下进行此操作，则最终将不得不将整个源文件读入内存:

lines = []
with open('path/to/input/file') as infile:
    for line in infile:
        for src, target in replacements.items():
            line = line.replace(src, target)
        lines.append(line)
with open('path/to/input/file', 'w') as outfile:
    for line in lines:
        outfile.write(line)

编辑：如果你使用的是 Python 2.x，请使用 replacements.iteritems() 替代 replacements.items()