Python的“正则表达式”模块：模糊度值

Question

Python的“正则表达式”模块：模糊度值

4

我正在使用Regex模块的“模糊匹配”功能。如何获取“匹配”的“模糊度值”，以指示模式与字符串之间的差异，就像Levenshtein中的“编辑距离”一样？我认为可以在Match对象中获取该值，但实际上并没有。官方文档也没有提到这一点。例如：

regex.match('(?:foo){e}','for')

a.captures() 告诉我匹配了单词 "for"，但我想知道模糊值，这种情况下应该是 1。

有没有办法实现？

- tslmy

这肯定不是理想的解决方案，但如果其他方法都失败了，你可以尝试使用 (?:foo){e<=i} 进行重复尝试，其中你需要循环某个整数 i。第一次匹配成功时，你的 i 就是莱文斯坦距离。 - Martin Ender

或者，如果您只处理有限数量的错误，可以使用类似于(foo)|((?:foo){e=1})|((?:foo){e=2})的东西，并检查哪个组匹配，如果第一个则e = 0，第二个则e = 1，以此类推。 - Qtax

2个回答

0

a = regex.match('(?:foo){e}','for')
a.fuzzy_counts

这会返回一个元组 (x,y,z) 其中:

x = 替换的数量

y = 插入的数量

z = 删除的数量

但这并不总是可靠的计数，即: 正则表达式的模糊匹配可能在某些情况下无法等同于真正的Levinstein距离。

Python regex模块模糊匹配: 替换计数不如预期

- Colin Anthony

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- falsetru · Accepted Answer

>>> import difflib
>>> matcher = difflib.SequenceMatcher(None, 'foo', 'for')
>>> sum(size for start, end, size in matcher.get_matching_blocks())
2
>>> max(map(len, ('foo', 'for'))) - _
1
>>>
>>>
>>> matcher = difflib.SequenceMatcher(None, 'foo', 'food')
>>> sum(size for start, end, size in matcher.get_matching_blocks())
3
>>> max(map(len, ('foo', 'food'))) - _
1

http://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher.get_matching_blocks http://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher.get_opcodes