Python正则表达式多重搜索

Question

Python正则表达式多重搜索

7

我需要搜索一个字符串中的多个单词。

import re

words = [{'word':'test1', 'case':False}, {'word':'test2', 'case':False}]

status = "test1 test2"

for w in words:
    if w['case']:
        r = re.compile("\s#?%s" % w['word'], re.IGNORECASE|re.MULTILINE)
    else:
        r = re.compile("\s#?%s" % w['word'], re.MULTILINE)
    if r.search(status):
        print "Found word %s" % w['word']

由于某些原因，它只能找到“test2”，而永远无法找到“test1”。为什么会这样呢？

我知道我可以使用|分隔搜索，但可能会有数百个单词，这就是为什么我在使用for循环。

- Hanpan

2个回答

2

正如Martijn所指出的，test1之前没有空格。但是你的代码还没有正确处理单词更长的情况。你的代码会将test2blabla作为test2的实例，我不确定这是否符合你的要求。

我建议使用单词边界正则表达式\b：

for w in words:
    if w['case']:
        r = re.compile(r"\b%s\b" % w['word'], re.IGNORECASE|re.MULTILINE)
    else:
        r = re.compile(r"\b%s\b" % w['word'], re.MULTILINE)
    if r.search(status):
        print "Found word %s" % w['word']

编辑：

我应该指出，如果你真的想允许只有 (空格)单词 或者 (空格)#单词 的格式，你不能使用\b。

- Norbert P.

你缺少了原始测试中的 #?。 - Martijn Pieters

很好。但是由于它对测试字符串没有影响，所以我把它去掉了。当然，它会干扰单词边界。 - Norbert P.

那不是一个相当重要的细节吗？ - Martijn Pieters

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martijn Pieters · Accepted Answer

status中test1前面没有空格，而您生成的正则表达式需要有空格。

您可以修改测试内容，使其匹配空格后或者在行首：

for w in words:
    if w['case']:
        r = re.compile("(^|\s)#?%s" % w['word'], re.IGNORECASE|re.MULTILINE)
    else:
        r = re.compile("(^|\s)#?%s" % w['word'], re.MULTILINE)
    if r.search(status):
        print "Found word %s" % w['word']