Python正则表达式 - 如何使用列表中的项替换多个捕获组

Question

Python正则表达式 - 如何使用列表中的项替换多个捕获组

4

这里有很多关于正则表达式的答案，但是没有一个符合我的需求：循环匹配多个结果并用列表中连续的项替换捕获内容。我查看了官方文档，但说实话，其中一些解释和示例对我来说太高级和复杂了。到目前为止，我已经学会了如何捕获多个组并给它们命名，但我卡在了如何插入不同的列表项的问题上。

伪代码示例...

for first_match group:
    insert list_item 1

for second_match group:
    insert list_item 2

for third_match group :
    insert list_item 3

简化的代码示例（实际脚本有十几个或更多匹配项）

字符串：

"Substitute a **list_item** here, Substitute a **list_item** here, Substitute a **list_item** here"

正则表达式：

\w.*(?P<first_match>list_item)\W.*\W.*(?P<second_match>list_item)\W.*\W.*(?P<third_match>list_item)

列表

["first_item", "second_item", "third_item"]

What I'm hoping to achieve looks like this:

"Substitute a **first_item** here, Substitute a **second_item** here, Substitute a **third_item** here"

我也可以使用未命名的组来实现这一点，但是使用命名可以使内容更加易读。

- Hazy

为什么要使用正则表达式？有具体的原因吗？还有很多其他方法也可以实现这个功能，所以我只是好奇。 - Chrispresso

如果您使用正则表达式，插入需要捕获中间数据。我不知道Python，但在Perl中可能会像这样完成：查找：(?<Before_item1>\w.)(?<Before_item2>\W.\W.)(?<Before_item3>\W.\W.*) 替换：$+{Before_item1}$list[1]$+{Before_item2}$list[2]$+{Before_item3}$list[3] - user557597

用Python剖析一只猫有很多种方法。作为一个脚本编程的新手，很难知道哪个工具最适合做什么工作。直到现在，我一直远离正则表达式，因为它们听起来很复杂、数学化并且看起来像一堆象形文字。然而，现在我已经卷起袖子，取得了一些成功，这让我开始想知道正则表达式是否与我上面描述的循环兼容。从Rawing的回答中，我现在知道它们确实可以应对，但我很想听听您更好/更快/更容易地剖析这只猫的建议。 - Hazy

2个回答

1

为什么不使用映射来替换？

def mapping_replace(s):
    import re
    mapping = \ # allows us to continue to the next line
    {
        'first_item': '"Hi there"',
        'second_item': '"waddup"',
        'third_item': '"potato"'
    }

    # for each key in the map
    for key in mapping.iterkeys():
        # replace any 'key' found with the 'value' that corresponds with it
        s = re.sub(r'\b%s\b' % key, mapping[key], s, flags=re.MULTILINE)

    return s

print mapping_replace('first_item substitute a first_item here, a second_item here and a third_item here... first_item') # prints "Hi there" substitute a "Hi there" here, a "waddup" here and a "potato" here "Hi there"

\b用于检查单词边界。如果您不关心这一点，只需匹配关键字，那么就不需要空格，例如first_itemyaa会将first_item替换为"Hi there"，用于"Hi there"yaa

- Chrispresso

我很感谢你提供的整洁的示例，我相信它将在未来很有用，但是对于我当前的任务来说，它不起作用，因为我的“键”都是相同的。我需要按顺序用列表中的项目逐个替换它们。 - Hazy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Aran-Fey · Accepted Answer

这可以很容易地通过使用start()和end()函数来实现。

import re

string= "Substitute a **list_item** here, Substitute a **list_item** here, Substitute a **list_item** here"
pattern= r'\w.*(?P<first_match>list_item)\W.*\W.*(?P<second_match>list_item)\W.*\W.*(?P<third_match>list_item)'

list= ["first_item", "second_item", "third_item"]


result= ''
i= 0
last_match= 0
match= re.match(pattern, string)
for count in xrange(len(match.groups())): # for each group...
    result+= string[last_match:match.start(i+1)] # add all text up to the start of the group
    result+= list[i] # add the next list item
    last_match= match.end(i+1)
    i+= 1
result+= string[last_match:] # finally, add all text after the last group

print result