使用正则表达式查找所有匹配项 - 贪婪和非贪婪！

Question

使用正则表达式查找所有匹配项 - 贪婪和非贪婪！

regexregex-greedy

4

以下是需要翻译的内容：

假设有一个字符串："Marketing and Cricket on the Internet"。

我想使用正则表达式查找所有可能的 "Ma"-任何文本- "et" 匹配项。所以..

Market
Marketing and Cricket
Marketing and Cricket on the Internet

正则表达式 Ma.*et 返回了 "Marketing and Cricket on the Internet"。正则表达式 Ma.*?et 返回了 Market。但我想要一个正则表达式能够返回这 3 个匹配项。这可能吗？

谢谢。

- Rastaboy

你真的需要正则表达式吗？ - Gumbo

LEPL是Python的解析库，它具有可以“yield”所有可能匹配项的正则表达式。 - user395760

4个回答

1

谢谢大家，这真的很有帮助。这是我为PHP想出来的：

function preg_match_ubergreedy($regex,$text) {

    for($i=0;$i<strlen($text);$i++) {
        $exp = str_replace("*","{".$i."}",$regex);
        preg_match($exp,$text,$matches);
        if($matches[0]) {
            $matched[] = $matches[0];
        }
    }

    return $matched;

}
$text = "Marketing and Cricket on the Internet";
$matches = preg_match_ubergreedy("@Ma.*?et@is",$text);

- Rastaboy

0

对于一个更加通用的正则表达式，另一个选项是递归地将贪婪正则表达式与前一个匹配进行匹配，依次丢弃第一个和最后一个字符，以确保只匹配前一个匹配的子字符串。在匹配了Marketing and Cricket on the Internet之后，我们测试arketing and Cricket on the Internet和Marketing and Cricket on the Interne是否有子匹配。

在C＃中，可以这样实现...

public static IEnumerable<Match> SubMatches(Regex r, string input)
{
    var result = new List<Match>();

    var matches = r.Matches(input);
    foreach (Match m in matches)
    {
        result.Add(m);

        if (m.Value.Length > 1)
        {
            string prefix = m.Value.Substring(0, m.Value.Length - 1);
            result.AddRange(SubMatches(r, prefix));

            string suffix = m.Value.Substring(1);
            result.AddRange(SubMatches(r, suffix));
        }

    }

    return result;
}

然而，这个版本有可能会返回相同的子匹配多次，例如在 Marketing and Marmosets on the Internet 中它会两次找到 Marmoset，第一次作为 Marketing and Marmosets on the Internet 的子匹配，第二次作为 Marmosets on the Internet 的子匹配。

- stevemegson

0

遗憾的是，使用标准的POSIX正则表达式无法实现此功能，它只返回单个（根据正则表达式规则最佳的）匹配项。您需要利用扩展功能来完成此任务，这可能存在于您正在使用此正则表达式的特定编程语言中，假设您正在程序中使用它。

- Michael Goldshteyn

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- thejh · Accepted Answer

据我所知：不行。

但是你可以先匹配非贪婪的内容，然后生成一个带量词的新正则表达式来获得第二个匹配。就像这样：

Ma.*?et
Ma.{3,}?et

...and so on...