使用正则表达式匹配所有完整引号

Question

使用正则表达式匹配所有完整引号

3

当你不知道引号是单引号还是双引号时，匹配引号是相当容易的：

>>> s ="""this is a "test" that I am "testing" today"""
>>> re.findall('[\'"].*?[\'"]',s)
['"test"', '"testing"']

这段代码将搜索字符串中的单引号或双引号，并获取它们之间的内容。但是，这里有一个问题：

如果字符串包含另一种类型的引号，它将关闭该字符串！以下是两个例子，以说明我的意思：

>>> s ="""this is a "test" and this "won't work right" at all"""
>>> re.findall('[\'"].*?[\'"]',s)
['"test"', '"won\'']
>>> s ="""something is "test" and this is "an 'inner' string" too"""
>>> re.findall('[\'"].*?[\'"]',s)
['"test"', '"an \'', '\' string"']

正则表达式'[\\'\\"]+.*?[\\'\\"]+'将匹配单引号与双引号，这显然是错误的。

那么，什么样的正则表达式既能匹配这两种引号，但只有在字符串以相同类型的引号结尾时才匹配实际字符串呢？

如果你感到困惑

这里是我希望得到的输出:

s ="""this is a "test" and this "won't work right" at all"""
re.findall(expression,s)
#prints ['"test"','"won\'t work right"']

s ="""something is "test" and this is "an 'inner' string" too"""
re.findall(expression,s)
['"test"', '"an \'inner\' string"',"'inner'"]

- Ryan Saxe

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Blender · Accepted Answer

4

将您的第一个字符类放入一个捕获组中，然后使用\1在另一侧引用它：

>>> re.findall(r'([\'"])(.*?)\1',s)
[('"', 'test'), ('"', "won't work right")]

- Blender

太好了！然后我可以使用列表推导式来使其成为正确的列表。 - Ryan Saxe

等一下，在我的实际情况中，它返回一个空列表。有什么问题吗...re.findall('\s+(.+?)=(["\'])(.*?)\2',s)其中s是一个类似于stuff name="content" name2='more content'的字符串。 - Ryan Saxe

没事了...它只能在前面加上 r 才能工作...为什么呢？ - Ryan Saxe

@RyanSaxe：那不会是XML/HTML吧？至于你的问题，你必须使用原始字符串（注意字符串字面量前面的小r？）。它将反斜杠视为反斜杠，因此 r'\n' == '\\n'。如果没有它，你就必须写 '\\s+(.+?)=(["\\'])(.*?)\\2。 - Blender