问题:
我正在寻找一种方法,以匹配给定行中以某些单词开头的特定标识符。该ID由字符组成,可能跟随数字,然后是一个破折号,然后是更多数字。只有在起始单词为以下之一时,才应在行上匹配ID:Closes、Fixes、Resolves。如果行包含多个ID,则这些ID将由字符串and
分隔。一行上可以存在任意数量的ID。
示例测试字符串:
Closes PD-1 # Match: PD-1
Related to PD-2 # No match, line doesn't start with an allowed word
Closes
NPD-1 # No match, as the identifier is in a new line
Fixes PD-21 and PD-22 # Match: PD-21, PD-22
Closes PD-31, also PD-32 and PD-33 # Match: PD-31 - the rest is not captured because of ", also"
Resolves PD4-41 and PD4-42 and PD4-43 and PD4-44 # Match: PD4-41, PD4-42, PD4-43, PD4-44
Resolves something related to N-2 # No match, the identifier is not directly after 'Resolves'
我尝试的方法:
使用正则表达式获取所有匹配项,在某些方面上总是有所不足。例如,我尝试过以下一个正则表达式:
^(?:Closes|Fixes|Resolves) (\w+-\d+)(?:(?: and )(\w+-\d+))*
- 我打算用非捕获组来匹配以其中一个允许的单词开头的行,并跟随一个空格:
^(?:Closes|Fixes|Resolves)
- 然后,至少需要一个ID跟随起始单词,我打算捕获这个ID:
(\w+-\d+)
- 最后,可以跟随第一个ID的零个或多个ID,它们由字符串
and
分隔,但我只想在此处捕获ID,而不是分隔符:(?:(?: and )(\w+-\d+))*
Python中此正则表达式的结果为:
test_string = """
Closes PD-1 # Match: PD-1
Related to PD-2 # No match, line doesn't start with an allowed word
Closes
NPD-1 # No match, as the identifier is in a new line
Fixes PD-21 and PD-22 # Match: PD-21, PD-22
Closes PD-31, also PD-32 and PD-33 # Match: PD-31 - the rest is not captured because of ", also"
Resolves PD4-41 and PD4-42 and PD4-43 and PD4-44 # Match: PD4-41, PD4-42, PD4-43, PD4-44
Resolves something related to N-2 # No match, the identifier is not directly after 'Resolves'
"""
ids = []
for match in re.findall("^(?:Closes|Fixes|Resolves) (\w+-\d+)(?:(?: and )(\w+-\d+))*", test_string, re.M):
for group in match:
if group:
ids.append(group)
print(ids)
['PD-1', 'PD-21', 'PD-22', 'PD-31', 'PD4-41', 'PD4-44']
此外,在 regex101.com 上的结果在此处。如果有多个 ID 跟随初始 ID,不幸的是它只捕获最后一个匹配项,而不是全部。据我所知,一个重复捕获组只会捕获最后一次迭代,我应该在重复组周围放置一个捕获组以捕获所有迭代,但是我无法让它工作。
摘要:
是否有通过正则表达式解决此问题的方法?类似于我尝试过的方法,但它可以捕获所有 ID 的出现吗?或者是否有更好的方式使用 Python 解析此字符串中的 ID?
(?:^(?:Closes|Fixes|Resolves)(?= \w+-\d+)|\G(?!^) (\w+-\d+)(?: and)?)
https://regex101.com/r/DeVNCZ/1 - The fourth bird