如何在负回顾捕获词语中捕获未知数量的单词，使用正则表达式？

Question

如何在负回顾捕获词语中捕获未知数量的单词，使用正则表达式？

4

我正在尝试排除那些在单词"dog"之前出现单词"owner"的记录。

the owner has a dog (排除)
the owner has a black and brown dog (排除)
John has a dog (包含)
John has a black and brown dog (包含)

以下是当前的正则表达式:

\b(?<!owner\s)\w+\sdog\b

这适用于单个未知单词（例如 'owner has dog' 被排除在外，但是 'owner has a dog' 包括在内），但是我无法捕获保留其负向回顾的跨越 "owner" 和 "dog" 之间所有单词的多个单词。

非常感谢。

- Sean Farrell

2个回答

1

另一个选项可能是开始匹配除了o或换行符之外的任何字符。

然后，如果你遇到一个 o，断言它不是单词owner，紧接着匹配除了 o 或换行符之外的任何字符，并可选择重复该过程，直到匹配到单词 dog。

 ^[^o\r\n]*(?:(?!\bowner\b)o[^o\r\n]*)*\bdog\b

说明

^ 字符串开始位置
[^o\r\n]* 匹配0次或多次除了字母o和换行符之外的任何字符
(?: 非捕获组
- (?!\bowner\b) 负向先行断言，断言右侧不直接跟着单词owner
- o[^o\r\n]* 匹配字母o后面0次或多次除了字母o和换行符之外的任何字符
)* 关闭非捕获组并重复匹配0次或多次
\bdog\b 匹配单词dog

正则表达式演示 | Python演示

- The fourth bird

非常感谢您的回复。我想知道是否可能将这个正则表达式应用于一个包含从整个段落中提取的“dog”单词的句子中？例如：“你好，我的名字是肖恩。约翰有一只狗。” 正则表达式仅在“Sean”后面的句号和句子“dog.”的末尾之间应用。 - Sean Farrell

1

@SeanFarrell 你可以使用正向回顾断言来断言字符串的开头或点，并排除匹配点。(?:(?<=\.)|(?<=^))[^o\r\n.]*(?:(?!\bowner\b)o[^o\r\n.]*)*\bdog\b https://regex101.com/r/E5pKl2/1 - The fourth bird

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Cary Swoveland · Accepted Answer

你可以使用以下正则表达式来验证字符串是否包含单词“dog”，但该单词前面没有单词“owner”。

^(?:(?!\bowner\b).)*\bdog\b

启动你的引擎！ _{^<¯\(ツ)/¯^>} Python 代码

Python 的正则表达式引擎执行以下操作。

^                : anchor match to beginning of string
(?:              : begin a non-capture group
  (?!\bowner\b)  : use a negative lookahead to assert that the current
                   position in the string is not followed by "owner"
  .              : match a character
)                : end non-capture group
*                : execute non-capture group 0+ times
\bdog\b          : match 'dog' surrounded by word boundaries

匹配不以禁用单词开头的一系列个别字符的技术称为 Tempered Greedy Token Solution。