Python - 仅匹配完整单词的字符串匹配

Question

Python - 仅匹配完整单词的字符串匹配

3

我有两个列表 - query 和 line。我的代码会查找一个名为query的查询，例如：

["president" ,"publicly"]

包含在段落中的一行代码(顺序有关)，例如：

["president" ,"publicly", "told"]

这是我目前正在使用的代码：

if ' '.join(query) in ' '.join(line)

问题是，我只想匹配整个单词。因此，以下查询将无法通过条件语句：

["president" ,"pub"]

我该怎么做？

- Tom

8个回答

1

这是一种方式：

这里是一个段落

re.search(r'\b' + re.escape(' '.join(query)) + r'\b', ' '.join(line)) is not None

- NPE

可能值得对查询字符串进行 re.escape 转义。 - Jon Clements

在使用None之后，出现了“SyntaxError: invalid syntax”错误。 - Tom

@Tom：在 None（和 re.search 之前）后面是什么？ - NPE

1

@Tom：你忘记在那个 if 语句的末尾加冒号了。 - NPE

1

只是为了好玩，你也可以这样做：

a = ["president" ,"publicly", "told"]
b = ["president" ,"publicly"]
c = ["president" ,"pub"]
d = ["publicly", "president"]
e = ["publicly", "told"]

from itertools import izip
not [l for l,n in izip(a, b) if l != n] ## True
not [l for l,n in izip(a, c) if l != n] ## False
not [l for l,n in izip(a, d) if l != n] ## False
## to support query in the middle of the line:
try:
  query_list = a[a.index(e[0]):]
  not [l for l,n in izip(query_list, e) if l != n] ## True 
expect ValueError:
  pass

- fredrik

1

只需使用“in”运算符：

mylist = ['foo', 'bar', 'baz']

'foo' in mylist -> 返回 True 'bar' in mylist -> 返回 True 'fo' in mylist -> 返回 False 'ba' in mylist -> 返回 False

- daveoncode

在我的列表中，'foo bar'不存在。 - Tom

当然... 当然: 'foo'在我的列表中，'bar'也在我的列表中 -> True... 那又怎样？ :) - daveoncode

0

您可以使用 issubset 方法来实现此功能。只需执行以下操作：

a = ["president" ,"publicly"]
b = ["president" ,"publicly", "told"]

if set(a).issubset(b):
    #bla bla

这将返回两个列表中匹配的项。

- Amyth

1

阅读问题，我认为顺序很重要，因此在许多情况下这将会给出错误的结果。 - Bakuriu

顺序很重要，因此它可以是一个子集，但不是按照确切的顺序。 - Tom

0

您可以使用内置的all量词函数：

if all(word in b for word in a):
    """ all words in list"""

请注意，对于长列表，这可能不是运行时效率高的选择。最好使用set类型而不是列表a（要搜索的单词列表）。

- Ber

0

这里有一种非正则表达式的方法来实现它。我相信正则表达式比这个要快得多：

>>> query = ['president', 'publicly']
>>> line = ['president', 'publicly', 'told']
>>> any(query == line[i:i+len(query)] for i in range(len(line) - len(query)))
True
>>> query = ["president" ,"pub"]
>>> any(query == line[i:i+len(query)] for i in range(len(line) - len(query)))
False

- jamylak

0

明确比隐式更好。由于顺序很重要，我会这样写下来：

query = ['president','publicly']
query_false = ['president','pub']
line = ['president','publicly','told']

query_len = len(query)
blocks = [line[i:i+query_len] for i in xrange(len(line)-query_len+1)]

blocks 包含所有需要检查的相关组合：

[['president', 'publicly'], ['publicly', 'told']]

现在你可以简单地检查你的查询是否在那个列表中：

print query in blocks # -> True
print query_false in blocks # -> False

这段代码的工作方式就像你用语言描述一个直接的解决方案一样，这通常对我来说是一个好兆头。如果你有很长的行并且性能成为问题，你可以通过生成器替换生成的列表。

- Achim

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Bakuriu · Accepted Answer

你可以使用正则表达式和 \b 单词边界来实现：

import re
the_regex = re.compile(r'\b' + r'\b'.join(map(re.escape, ['president', 'pub'])) + r'\b')
if the_regex.search(' '.join(line)):
    print 'matching'
else:
    print 'not matching'

作为替代方案，您可以编写一个函数来检查给定列表是否是该行的子列表。类似这样：

def find_sublist(sub, lst):
    if not sub:
        return 0
    cur_index = 0
    while cur_index < len(lst):
        try:
            cur_index = lst.index(sub[0], cur_index)
        except ValueError:
            break

        if lst[cur_index:cur_index + len(sub)] == sub:
            break
        lst = lst[cur_index + 1:]
    return cur_index

您可以将其用作：

。

if find_sublist(query, line) >= 0:
    print 'matching'
else:
    print 'not matching'