这里有一种方法,使用零宽度回顾来隔离每个名称:
string = "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555 -6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
result = re.findall(r'(?:(?<=^)|(?<=[^A-Za-z.,]))[A-Za-z.,]+(?: [A-Za-z.,]+)*(?:(?=[^A-Za-z.,])|(?=$))', string)
print(result)
['Moe Szyslak', 'Burns, C. Montgomery', 'Rev. Timothy Lovejoy', 'Ned Flanders',
'Simpson, Homer', 'Dr. Julius Hibbert']
实际匹配的模式是这样的:
[A-Za-z.,]+(?: [A-Za-z.,]+)*
这段文字意思是匹配任何大写或小写字母、点号或句号,后跟一个空格和一个或多个相同字符,零次或多次。
此外,我们在此模式的左侧和右侧使用以下环视:
(?:(?<=^)|(?<=[^A-Za-z.,]))
Lookbehind and assert either the start of the string, or a non matching character
(?:(?=[^A-Za-z.,])|(?=$))
Lookahead and asser either the end of the string or a non matching character