Java正则表达式中的混淆问题

Question

Java正则表达式中的混淆问题

4

以下是这两个正则表达式的含义：

1. /^\d+$/ 表示字符串必须由数字组成，且没有其他字符。

2. /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/ 表示字符串必须符合标准的电子邮件地址格式。

.*? and .+?

实际上，我了解这些量词的用法，即：

'.' -> Any character
'*' -> 0 or more times
'+' -> once or more times
'?' -> 0 or 1

实际上，我对使用.*?和.+?非常困惑！！！有人能提供这些情况下的恰当示例吗？

欢迎分享展示有用示例实践的好链接。提前致谢。

- puru

2

你尝试过阅读文档吗？如果是的话，但仍然不清楚，请解释具体让你困惑的地方。 - NPE

主要是：* --> 表示0个或多个字符，+ --> 表示1个或多个字符，这是让你困惑的吗？ - Ahmed Hamdy

2

公平地说，对于提问者来说，“？”是棘手的部分。 - Bathsheba

3个回答

3

.*? 和 .+? 是勉强型量词。

它们从输入字符串的开头开始，然后勉强地一次吃掉一个字符，寻找匹配项。它们尝试的最后一件事是整个输入字符串。

考虑以下代码：

        String lines="some";
        String REGEX=".+?";
        Pattern pattern=Pattern.compile(REGEX);
        Matcher matcher =pattern.matcher(lines);
        while(matcher.find()){
            String result=matcher.group();
            System.out.println("RESULT of .+? : "+result);
            System.out.println("RESULT LENGTH : "+result.length());
        }
        System.out.println(lines);
        String REGEX1=".*?";
        Pattern pattern1=Pattern.compile(REGEX1);
        Matcher matcher1 =pattern1.matcher(lines);
        while(matcher1.find()){
            int start=matcher1.start() ;
            int end=matcher1.end() ;
            String result=matcher1.group();
            System.out.println("RESULT of .*? : "+result);
            System.out.println("RESULT LENGTH : "+result.length() +" ,  start "+ start+" end :"+end);
        }

输出：

RESULT of .+? : s
RESULT LENGTH : 1
RESULT of .+? : o
RESULT LENGTH : 1
RESULT of .+? : m
RESULT LENGTH : 1
RESULT of .+? : e
RESULT LENGTH : 1
some
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 0 end :0
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 1 end :1
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 2 end :2
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 3 end :3
RESULT of .*? : 
RESULT LENGTH : 0 ,  start 4 end :4

.+? 尝试在每个字符中寻找匹配项，并且它的匹配长度为1。

.*? 尝试在每个字符或空内容中寻找匹配项。并且它会在每个字符处匹配空字符串。

- Sujith PS

非常感谢。但我猜 (.) 带来了分组。所以在我的情况下，它只是 .? 和 .+?。你能否给出这两个相关示例？ - puru

1

优秀的例子。非常感谢您... - puru

2

为了说明，考虑输入字符串xfooxxxxxxfoo。

Enter your regex: .*foo  // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo  // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

第一个示例使用贪婪量词 .* 来查找“任何”东西，零次或多次，后跟字母“f”“o”“o”。因为量词是贪婪的，表达式的 .* 部分首先吞掉整个输入字符串。此时，整体表达式无法成功，因为最后三个字母（“f”“o”“o”）已经被消耗。因此，匹配器逐渐地一次回退一个字母，直到右侧出现的“foo”已被重新提取，此时匹配成功并结束搜索。

然而，第二个示例是不情愿的，所以它首先消耗“nothing”。因为“foo”没有出现在字符串的开头，它被迫吞下第一个字母（一个“x”），这触发了0和4处的第一个匹配。我们的测试工具继续这个过程，直到输入字符串用尽。它在4和13处找到另一个匹配项。

第三个示例未能找到匹配项，因为量词是占有的。在这种情况下，整个输入字符串都被 .*+ 消耗掉了，没有剩余的内容可以满足表达式末尾的“foo”。在想要占用所有内容而永远不会回退的情况下，请使用占有量词。在匹配没有立即找到的情况下，它将优于等效的贪婪量词。

您可以在以下链接中找到此内容：http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

- Lakshmi

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rich O'Kelly · Accepted Answer

我们可以分解为以下：

. - Any character
* - Any number of times
? - That is consumed reluctantly

. - Any character
+ - At least once
? - That is consumed reluctantly

不情愿或“非贪婪”量词（“？”）尽可能匹配最少的字符以便找到匹配项。有关量词（贪婪、不情愿和占有方式）的更深入介绍，请参见此处