匹配所有内容，直到整个正则表达式再次匹配

Question

匹配所有内容，直到整个正则表达式再次匹配

3

这是我正在尝试匹配的字符串（实际的字符串要长得多）。

VC1000 Venture Capital 4 cr.
This is a class about venture capital
and more description, that could mention a future course like
VC2000 but might not
VC2000 venture capital II 4 cr.
Another description about blah
VC 3000 venture capital III 4-6 cr.
back again

我正在尝试获取类似以下的组：

[VC1000]
[风险投资]
[4]
[这是一个关于风险投资和更多描述的课程，可能会提到未来的课程，如VC2000，但也可能不会]

我几乎做到了，但我不确定如何获取类别列表之间的描述。现在我有：

(^\*?[A-Z]{2}\s?[0-9]{4}) (.*?)([0-9]|[0-9]-[0-9]+)\s?cr\.

但我不确定该怎么继续。添加.*匹配太多，而使用上面的第一组.*会防止第一组被捕获每个其他匹配。

我缺少什么技巧？

- pseudodev

课程行尾总是以cr.结尾吗？ - Nick

尝试在正则表达式的开头加上起始字符串“^”和结尾字符串“$”。 - lemon

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Andrej Kesely · Accepted Answer

尝试(regex101):

import re

pat = r'^([A-Z]{2}\s*\d{4})\s+([^\n]+?)(\d+-?\d*\s+cr\.)$(.*?)(?=^[A-Z]{2}\s*\d{4}\s+[^\n]+?\d+-?\d*\s+cr\.$|\Z)'
pat = re.compile(pat, flags=re.S|re.M)

text = '''\
VC1000 Venture Capital 4 cr.
This is a class about venture capital
and more description, that could mention a future course like
VC2000 but might not
VC2000 venture capital II 4 cr.
Another description about blah
VC 3000 venture capital III 4-6 cr.
back again'''

for a, b, c, d in pat.findall(text):
    print(a)
    print(b)
    print(c)
    print(d)
    print('-' * 80)

输出：

VC1000
Venture Capital 
4 cr.

This is a class about venture capital
and more description, that could mention a future course like
VC2000 but might not

--------------------------------------------------------------------------------
VC2000
venture capital II 
4 cr.

Another description about blah

--------------------------------------------------------------------------------
VC 3000
venture capital III 
4-6 cr.

back again
--------------------------------------------------------------------------------