在不修改文件的情况下，在文件中替换Python字符串，如果没有进行替换。

Question

在不修改文件的情况下，在文件中替换Python字符串，如果没有进行替换。

5

Python的string.replace在没有字符串替换时返回什么？如果没有进行更改，Python的file.open(f, 'w')是否总是会触碰到文件？

我正在使用Python尝试替换一组文件中的'oldtext'出现次数为'newtext'。如果文件包含'oldtext'，则要进行替换并保存文件。否则，不执行任何操作，以便文件保留其旧时间戳。

以下代码运行良好，但是即使没有进行字符串替换，所有文件也会被写入，并且所有文件都具有新的时间戳。

for match in all_files('*.html', '.'):  # all_files returns all html files in current directory     
  thefile = open(match)
  content = thefile.read()              # read entire file into memory
  thefile.close()
  thefile = open(match, 'w')             
  thefile.write(content.replace(oldtext, newtext))  # write the file with the text substitution
  thefile.close()

在这段代码中，我试图只在发生字符串替换时进行文件写入，但是所有文件仍然会获得新的时间戳：

count = 0
for match in all_files('*.html', '.'):       # all_files returns all html files in current directory
    thefile = open(match)
    content = thefile.read()                 # read entire file into memory
    thefile.close()
    thefile = open(match, 'w')
    replacedText = content.replace(oldtext, newtext) 
    if replacedText != '':
        count += 1
        thefile.write(replacedText)
    thefile.close()
print (count)        # print the number of files that we modified

最后，count是文件的总数，而不是修改的文件数。有什么建议吗？谢谢。

我在Windows上使用Python 3.1.2。

- LandedGently

你的意思是想要用字符串 'newtext' 替换文件中所有出现的字符串 'oldtext' 吗？还是想要用名为 oldtext 对象所持有的字符串替换所有出现的该字符串，用名为 newtext 对象所持有的字符串替换它？如果是前者，有一种很简单的方法可以做到。 - eyquem

@eyquem - 第一种情况。我想用'newtext'替换所有的字符串'oldtext'，并且只有在发生替换时才想重写文件。因此，如果文件中没有出现'oldtext'，则不应更新该文件。@unutbu和@J.F. Sebastian提供的解决方案都可以工作。 - LandedGently

3个回答

4

如果 Python 的 string.replace 没有进行字符串替换，它会返回什么？

str.replace() 如果对象是字符串的子类，则返回字符串本身或其副本。

即使没有进行任何更改，Python 的 file.open(f, 'w') 是否总是会触及文件？

open(f, 'w') 打开并截断文件 f。

请注意下面的代码仅适用于 CPython；在 pypy、jython 上无法正常工作：

count = 0
for match in all_files('*.html', '.'):
    content = open(match).read()
    replacedText = content.replace(oldtext, newtext) 
    if replacedText is not content:
       count += 1
       open(match, 'w').write(replacedText)
print (count)

- jfs

你的意思是 if replacedText is not content，对吗？ - John Machin

3

您的情况是一个特殊情况：'newtext'与'oldtext'具有完全相同的字符数。

因此，可以使用以下代码之一来替换单词'oldtext'或存在该单词的行，通过单词'newtext'或一行中'newtext'替换'oldtext'。

如果文件大小不超级大，则可以将每个文件的内容完全读入内存。

from os import fsync      # code using find()

count = 0
for match in all_files('*.html', '.'):
    with open(match,'rb+') as thefile:
        diag = False
        fno = thefile.fileno()
        content = thefile.read()
        thefile.seek(0,0)
        x = content.find('oldtext')
        while x>=0:
            diag = True
            thefile.seek(x,1)
            thefile.write('newtext')
            thefile.flush()
            fsync(fno)
            x = content[thefile.tell():].find('oldtext')
    if diag:
        cnt += 1

或者

from os import fsync     # code using a regex
import re
pat = re.compile('oldtext')

count = 0
for match in all_files('*.html', '.'):
    with open(match,'rb+') as thefile:
        diag = False
        fno = thefile.fileno()
        content = thefile.read()
        thefile.seek(0,0)
        prec = 0
        for mat in pat.finditer(content):
            diag = True
            thefile.seek(mat.start()-prec,1)
            thefile.write('newtext')
            thefile.flush()
            fsync(fno)
            prec = mat.end()
    if diag:
        cnt += 1

对于大文件，可以逐行读取并重写：

from os import fsync   # code for big files, using regex
import re
pat = re.compile('oldtext')

count = 0
for match in all_files('*.html', '.'):
    with open(match,'rb+') as thefile:
        diag = False
        fno = thefile.fileno()
        line = thefile.readline()
        while line:
            if 'oldtext' in line:
                diag = True
                thefile.seek(-len(line),1)
                thefile.write(pat.sub('newtext',line))
                thefile.flush()
                fsync(fno) 
            line = thefile.readline()
    if diag:
        cnt += 1

每次写入后，需要使用指令 thefile.flush() 和 fsync(fno) 以确保文件处理器 thefile 在任何时刻都能准确指向文件的确切位置。这允许通过指令 write() 获得有效的写入顺序。

flush() 不一定将文件数据写入磁盘。使用 flush() 后跟 os.fsync() 以确保此行为。 http://docs.python.org/library/stdtypes.html#file.flush

.

这些程序只做最少的操作。因此我认为它们很快。

.

特别注意：以 'rb+' 模式打开的文件如果未进行修改，则其上次修改时间不会更改。

- eyquem

谢谢您详细的回答。另外，我不知道文件模式'rb+'，所以这是一个很好的提示。我的问题实际上更为普遍：我想用不同的字符串替换给定字符串的出现。 - LandedGently

如果你的问题实际上更为普遍，为什么在你的问题中要写“我正在尝试用'newtext'替换'oldtext'的出现次数”？然而，在你的问题中的代码确实包含了像这样的指令：“replace(oldtext, newtext)”，其中oldtext和newtext不是字符串值，而是对象的名称。 - eyquem

@LandedGently，正是因为这些指令与短语“用'newtext'替换'oldtext'的所有出现次数”相矛盾，所以我问你：“你是想用字符串'newtext'替换文件中存在的所有字符串'oldtext'吗？还是你想用名称为oldtext的对象所包含的字符串替换名称为newtext的对象所包含的字符串？” - eyquem

@LandedGently，你回答说：“我想用'newtext'替换所有的'oldtext'字符串”，现在却相反了。这很令人失望。我不知道我应该如何表达我的问题；难道我的问题不够详细吗？你应该从一开始就明确你实际想要什么，而不是欺骗那些试图帮助你的人。 - eyquem

你说得对，我在回应你第一条评论中的问题时，原始评论表述不够明确。无论如何，感谢你的帮助 :-) - LandedGently

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- unutbu · Accepted Answer

如果Python的string.replace没有进行任何字符串替换会返回什么？

它会返回原始字符串。

无论是否进行更改，Python的file.open(f, 'w')是否总是会触及文件？

不仅会触及该文件，还会破坏f原来包含的所有内容。

因此，您可以通过if replacedText != content测试文件是否需要重写，只有在这种情况下才以写模式打开该文件：

count = 0
for match in all_files('*.html', '.'):       # all_files returns all html files in current directory
    with open(match) as thefile:
        content = thefile.read()                 # read entire file into memory
        replacedText = content.replace(oldtext, newtext)
    if replacedText!=content:
        with open(match, 'w') as thefile:
            count += 1
            thefile.write(replacedText)
print (count)        # print the number of files that we modified