获取特殊字符 « 和 » 包含之外的所有双引号字符

3

我想从所有位于括号«和»之外的子字符串中获取所有双引号,并用转义字符和双引号(即\")替换它们。例如:

输入字符串:

'The first generally recognized "wiki" application,«"WikiWikiWeb"», was created by American computer programmer "Ward Cunningham" in 1994'

期望输出:

'The first generally recognized \"wiki\" application,«"WikiWikiWeb"», was created by American computer programmer \"Ward Cunningham\" in 1994'

我尝试了以下代码。
string = '''The first generally recognized "wiki" application,«"WikiWikiWeb"», was created by American computer programmer "Ward Cunningham" in 1994'''

import re
arr = re.findall(r'(.*?)\«.*?\»', string)
for tag in arr :
 new_tag = tag.replace('"','\\"')
 string = string.replace(tag, new_tag)

Output: The first generally recognized \"wiki\" application,«"WikiWikiWeb"», was created by American computer programmer "Ward Cunningham" in 1994

这段代码的问题在于正则表达式没有给我所有的子字符串,包括第二个子字符串。期望的结果应该是: ['第一个被广泛认知的"wiki"应用程序','是由美国计算机程序员"Ward Cunningham"在1994年创建的']. 我希望正则表达式能够给我所有的引号内容,而不是其它特殊字符所包含的子字符串本身。
2个回答

2
string = '''The first generally recognized "wiki" application,«blah"WikiWikiWeb"blah», was created by American computer programmer "Ward Cunningham" in 1994'''

import re
arr = re.findall(r'«.*?»|(".+?")', string)
for tag in arr :
  new_tag = tag.replace('"','\\"')
  string = string.replace(tag, new_tag)

print string

输出:

The first generally recognized \"wiki\" application,«blah"WikiWikiWeb"blah», was created by American computer programmer \"Ward Cunningham\" in 1994

1
您可以使用此正则表达式模式:
string = re.sub(r'(?<!\«)"(?!\»)','\\"',string)

"(?<!«)"是负回顾后发现,意思是查找不跟在«后面的"。 而(?!»)是负预测先行,具有相同的效果,但向后工作。"

@SheetalJagtap:如果在«"之间有字符,例如«blah"text"blah»,它就无法正常工作。 - Toto

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接