我有一个长字符串是一个段落,但是句号后面没有空格。例如:
para = "I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women.It is in black and white but saves the colour for one shocking shot.At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene.Avoid."
我尝试使用 re.sub 解决这个问题,但输出的结果与我预期的不同。这是我的做法:
我正在尝试使用 re.sub 来解决此问题,但输出结果并非我所期望的。
这是我所做的:
re.sub("(?<=\.).", " \1", para)
我正在匹配每个句子的第一个字符,并希望在其前面放置一个空格。我的匹配模式是(?<=\.).
,它(据说)检查出现在句点后面的任何字符。我从其他stackoverflow问题中学到,\1匹配上次匹配的模式,因此我将替换模式写成\1
,一个空格后跟先前匹配的字符串。
以下是输出:
"I saw this film about 20 years ago and remember it as being particularly nasty. \x01I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women. \x01t is in black and white but saves the colour for one shocking shot. \x01t the end the film seems to be trying to make some political statement but it just comes across as confused and obscene. \x01void. \x01
与其匹配任何在点号之前的字符并在其前添加空格,re.sub
用 \x01
替换了匹配的字符。为什么?如何在匹配的字符串前添加字符?
text = text.replace(".", ". ").replace(". " + " ", ". ")
(字符串连接是因为Stack Exchange会吃掉双空格)。基本上,将每个句号替换为句号+空格,将每个句号+空格+空格替换为句号+单空格。不需要正则表达式,也不需要导入任何东西。 - Fake Name