Python 3.3中的re.sub

Question

Python 3.3中的re.sub

5

我正在尝试将文本字符串从file1更改为file01的形式。我对Python非常陌生，无法确定在使用模式时'repl'位置应填写什么。有人可以帮助我吗？

text = 'file1 file2 file3'

x = re.sub(r'file[1-9]',r'file\0\w',text) #I'm not sure what should go in repl.

- user2243215

5个回答

2

为了匹配以单个数字结尾的文件，请使用单词边界\b：

>>> text = ' '.join('file{}'.format(i) for i in range(12))
>>> text
'file0 file1 file2 file3 file4 file5 file6 file7 file8 file9 file10 file11'
>>> import re
>>> re.sub(r'file(\d)\b',r'file0\1',text)
'file00 file01 file02 file03 file04 file05 file06 file07 file08 file09 file10 file11'

- Mark Tolonen

1

在检查文件中是否存在两个数字的情况下，也可以使用\D|$。这决定是否将文件替换为file0。

以下代码也有助于实现所需的功能。

import re

text = 'file1 file2 file3 file4 file11 file22 file33 file1'

x = re.sub(r'file([0-9] (\D|$))',r'file0\1',text)

print(x)

- Jagadanna

0

您可以使用分组来捕获您想要保留的部分，然后在替换文本中使用这些分组。

 x = re.sub(r'file([1-9])',r'file0\1',text)

匹配组是通过在正则表达式搜索中包含( )来创建的。然后，您可以使用\group或\1与之配合使用，在这种情况下，我们希望插入第一个组。

- melwil

0

我相信下面的内容会对你有所帮助。这个方案很有用，因为它只会在 'file' 后面只有一位数字的情况下插入一个 '0' （通过边界 ['\b'] 特殊字符包含）:

text = 'file1 file2 file3'

findallfile = re.findall(r'file\d\b', text)

for instance in findallfile:
    textwithzeros = re.sub('file', 'file0', text)

现在，'textwithzeros' 应该是 'text' 字符串的一个新版本，并且每个数字前都有一个 '0'。尝试一下吧！

- KAG1224

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jerry · Accepted Answer

你可以试试这个：

你可以尝试以下方法：

>>> import re    
>>> text = 'file1 file2 file3'
>>> x = re.sub(r'file([1-9])',r'file0\1',text)
'file01 file02 file03'

括号括起来的[1-9]捕获了匹配项，它是第一个匹配项。你会看到我在使用\1进行替换时使用了它，表示匹配中的第一个匹配项。

此外，如果您不想为具有2位或更多数字的文件添加零，则可以在正则表达式中添加[^\d]。

x = re.sub(r'file([1-9](\s|$))',r'file0\1',text)

现在我重新审视这个答案，并使用 str.format() 和一个 lambda 表达式来提供更通用的解决方案：

import re
fmt = '{:03d}'                 # Let's say we want 3 digits with leading zeroes
s = 'file1 file2 file3 text40'
result = re.sub(r"([A-Za-z_]+)([0-9]+)", \
                lambda x: x.group(1) + fmt.format(int(x.group(2))), \
                s)
print(result)
# 'file001 file002 file003 text040'

关于lambda表达式的一些细节：

lambda x: x.group(1) + fmt.format(int(x.group(2)))
#         ^--------^   ^-^        ^-------------^
#          filename   format     file number ([0-9]+) converted to int
#        ([A-Za-z_]+)            so format() can work with our format

我正在使用表达式[A-Za-z_]+，假设文件名除了训练数字之外只包含字母和下划线。如果需要，请选择更合适的表达式。