Python正则表达式匹配仅部分括号

Question

Python正则表达式匹配仅部分括号

3

我有一些格式不规范的文本需要过滤。因此，文本中有许多情况下引用在一个行中开始，然后被截断并在第二行结束。在这种情况下，我的偏好是完全删除部分引用，但是我想保留常规的完整引用。我知道可以通过计数器迭代地完成这个任务，但我真的更喜欢使用正则表达式解决问题。

以以下为例：

"This is a quote" This is an end "partial- quote" Here is more text. This is an end "partial- quote w/o more text" This is an "embedded" quote 这是我的当前尝试示例。请查看此处： (\"[^\"\n]+?|^[^\"\n]+?\")(\n|$) 请注意它会在两种情况下失败：

1. 第三行-部分引用在句子的剩余部分之前（非常罕见的情况，所以如果我们不能解决，这并不是世界末日）。 2. 第六行-嵌入式引用。这是一个重大问题，也是我提出问题的主要原因。它抓住了嵌入式引用中的最后一个引用到该行的结尾。

我想过设置一个 if 语句，并逐行运行每个行，检查其是否少于两个引用，然后继续解析部分引用，但是我认为 SO 的专家们会有一个更简洁的解决方案。

注意：期望输出如下：

"This is a quote" This is an end Here is more text. This is an end This is an "embedded" quote （我稍后会处理空格）

- andoni

也许你可以看一下我提出的正则表达式？ - Jerry

3个回答

1

("[^"\n]*")|"[^"]*(\n)[^"]*"(?![^\n]*")|"[^"]*\n.*?(?=\n[^"]*"[^\n"]*")

你可以尝试这个。这也会处理奇数引号的情况。请查看演示。

https://regex101.com/r/dL7oF8/6

- vks

@andoni 你也可以尝试使用奇数个 " 来实现这个。 - vks

1

你可以尝试使用这个正则表达式：

"[^"\n]+?\n[^"\n]+?(?:"|$)\s*

将

中的文本替换为\n。

regex101演示

"[^"\n]+?\n[^"\n]+?仅匹配部分引号（确保引号之间有换行符）。

ideone演示

- Jerry

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Avinash Raj · Accepted Answer

给您带来以下内容：

^((?:[^"\n]*"[^"\n]*")*[^"\n]*)"[^"\n]*\n[^"\n]*"(\n|)

将匹配的字符替换为\1\n

演示

>>> import re
>>> s = '''"This is a quote"
This is an end "partial-
quote" Here is more text.
This is an end "partial-
quote w/o more text"
This is an "embedded" quote'''
>>> m = re.sub(r'(?m)^((?:[^"\n]*"[^"\n]*")*[^"\n]*)"[^"\n]*\n[^"\n]*"(\n|)', r'\1\n', s)
>>> print(m)
"This is a quote"
This is an end 
 Here is more text.
This is an end 
This is an "embedded" quote

如果您需要处理出现在双引号内的多个行，则可以使用此正则表达式。

^((?:[^"\n]*"[^"\n]*")*[^"\n]*)"(?:[^"\n]*\n)+[^"\n]*"(\n|)

DEMO