如何从字符串列表中检索部分匹配项

Question

如何从字符串列表中检索部分匹配项

31

寻找在数字列表中检索部分匹配的方法，请参阅以下内容：

但是，如果您正在寻找如何获取字符串列表的部分匹配项，则可以在下面的答案中简洁地解释最佳方法。

SO: Python list lookup with partial match展示了如何返回一个bool，如果list包含部分匹配（例如开头、结尾或包含）某个字符串的元素。但是，如何返回元素本身，而不是True或False。

示例：

l = ['ones', 'twos', 'threes']
wanted = 'three'

在这里，链接问题中的方法将使用以下内容返回True:

any(s.startswith(wanted) for s in l)

那么如何返回元素 'threes' 呢？

- vestland

5个回答

8

你可以使用for循环来查找字符串，而不是返回any()函数的结果：

def find_match(string_list, wanted):
    for string in string_list:
        if string.startswith(wanted):
            return string
    return None

>>> find_match(['ones', 'twos', 'threes'], "three")
'threes'

- damon

8

一份简洁明了的回答：

test_list = ['one', 'two','threefour']
r = [s for s in test_list if s.startswith('three')]
print(r[0] if r else 'nomatch')

结果：

threefour

如果没有匹配，不确定你想要做什么。如果有匹配，r [0] 正是您所要求的，但如果没有匹配，则未定义。 print 处理了这个问题，但您可能希望以不同的方式处理。

- CryptoFool

6

我认为最相关的解决方案是使用 next 而不是 any：

>>> next((s for s in l if s.startswith(wanted)), 'mydefault')
'threes'
>>> next((s for s in l if s.startswith('blarg')), 'mydefault')
'mydefault'

与 `any` 一样，它在找到匹配项后停止搜索，并且仅占用 O(1) 空间。不同于列表推导式解决方案，它始终处理整个列表并占用 O(n) 空间。

哦，或者只需直接使用 `any`，但记住上次检查的元素：

>>> if any((match := s).startswith(wanted) for s in l):
        print(match)

threes
>>> if any((match := s).startswith('blarg') for s in l):
        print(match)

>>>

另一种变体，只分配匹配的元素：

>>> if any(s.startswith(wanted) and (match := s) for s in l):
        print(match)

threes

（如果匹配的s可能是空字符串，可以考虑包含类似于 or True 的内容。）

- superb rain

5

我认为这很简单，所以我可能读错了，但你可以通过for循环和if语句运行它;

l = ['ones', 'twos', 'threes']
wanted = 'three'

def run():
    for s in l:
        if (s.startswith(wanted)):
            return s

print(run())

输出：threes

- Ironkey

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Trenton McKinney · Accepted Answer

startswith和in返回布尔值。
in运算符用于测试成员资格。
这可以通过list-comprehension或filter执行。
使用包含in的list-comprehension是已经测试过的最快实现方式。
如果不区分大小写，考虑将所有单词映射为小写形式。
- l = list(map(str.lower, l))。
在Python 3.11.0中进行了测试。

`filter`：

使用filter创建一个filter对象，因此使用list()将所有匹配的值显示在一个list中。

l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = list(filter(lambda x: x.startswith(wanted), l))

# using in
result = list(filter(lambda x: wanted in x, l))

print(result)
[out]:
['threes']

`列表推导式`

l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = [v for v in l if v.startswith(wanted)]

# using in
result = [v for v in l if wanted in v]

print(result)
[out]:
['threes']

哪个实现更快？

在 Jupyter Lab 中使用 nltk v3.7 的 words 语料库进行测试，该语料库包含 236736 个单词
有着' three '的单词
- ['three', 'threefold', 'threefolded', 'threefoldedness', 'threefoldly', 'threefoldness', 'threeling', 'threeness', 'threepence', 'threepenny', 'threepennyworth', 'threescore', 'threesome']

from nltk.corpus import words

%timeit list(filter(lambda x: x.startswith(wanted), words.words()))
%timeit list(filter(lambda x: wanted in x, words.words()))
%timeit [v for v in words.words() if v.startswith(wanted)]
%timeit [v for v in words.words() if wanted in v]

`%timeit`结果

62.8 ms ± 816 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
53.8 ms ± 982 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
56.9 ms ± 1.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
47.5 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

如何从字符串列表中检索部分匹配项

示例：

filter：

列表推导式

%timeit结果

`filter`：

`列表推导式`

`%timeit`结果