我正在尝试使用Python中的正则表达式从段落中提取一句话。通常我测试的代码可以正确地提取句子,但在以下段落中,该句子未被正确提取。
段落内容:
“But in the case of malaria infections and sepsis, dendritic cells throughout the body are concentrated on alerting the immune system, which prevents them from detecting and responding to any new infections.” A new type of vaccine?
代码如下:
段落内容:
“But in the case of malaria infections and sepsis, dendritic cells throughout the body are concentrated on alerting the immune system, which prevents them from detecting and responding to any new infections.” A new type of vaccine?
代码如下:
def splitParagraphIntoSentences(paragraph):
import re
sentenceEnders = re.compile('[.!?][\s]{1,2}(?=[A-Z])')
sentenceList = sentenceEnders.split(paragraph)
return sentenceList
if __name__ == '__main__':
f = open("bs.txt", 'r')
text = f.read()
mylist = []
sentences = splitParagraphIntoSentences(text)
for s in sentences:
mylist.append(s.strip())
for i in mylist:
print i
当使用上述段落进行测试时,输出结果与输入段落完全相同,但输出应该如下所示-
但在疟疾感染和败血症的情况下,全身的树突状细胞都集中在警觉免疫系统,这会使它们无法检测和应对任何新的感染
一种新型疫苗
正则表达式有什么问题吗?