模式: a(?(?<! ) )b (c)
输入: a b c
描述: 如果回顾不是空格,则条件应匹配空格。
它匹配正确,但捕获组 $1 为空(而不是包含 c)。
这是 .net 正则表达式的问题还是我遗漏了什么?
示例: http://regexstorm.net/tester?p=a(%3f(%3f%3C!+)+)b+(c)&i=a+b+c
模式: a(?(?<! ) )b (c)
输入: a b c
描述: 如果回顾不是空格,则条件应匹配空格。
它匹配正确,但捕获组 $1 为空(而不是包含 c)。
这是 .net 正则表达式的问题还是我遗漏了什么?
示例: http://regexstorm.net/tester?p=a(%3f(%3f%3C!+)+)b+(c)&i=a+b+c
我不确定这种行为是否有文档记录(如果有的话,我没有找到),但使用包含显式零宽断言的条件结构作为其表达式(?(?=expression)yes|no)
会覆盖紧接着的下一个带编号的捕获组(将其清空)。您可以通过运行以下正则表达式进行确认:
a(?(?<! ) )b (c)()
解决此问题的四种方法:
Enclosing expression in parentheses noted by @DmitryEgorov (that also keeps second capturing group intact) and is not included in result - the right way:
a(?((?<! )) )b (c)
As this behavior is only applied to unnamed capturing groups (default) you can get expected result using a named capturing group:
a(?(?<! ) )b (?<first>c)
Adding an extra capturing group where ever you like between (c)
and conditional:
a(?(?<! ) )(b) (c)
Avoiding such an expression if possible. E.g:
a(?( ) )b (c)
Capture group array gets mangled (as pointed out by the OP), namely the capture group immediately following the conditional construct is lost the other groups are shifted left leaving the last capture group undefined.
In the following examples the expected capture allocation is
$1="a", $2="b", $3="c"
while the actual result is
$1="a", $2="c", $3="" (the latter is empty string)
Applies to:
(a)(?(?=.) )(b) (c)
- positive lookahead(a)(?(?!z) )(b) (c)
- negative lookahead(a)(?(?<=.) )(b) (c)
- positive lookbehind(a)(?(?<! ) )(b) (c)
- negative lookbehind(a)(?(?: ) )(b) (c)
- noncapturing group(a)(?(?i:.) )(b) (c)
- group options(a)(?(?>.) )(b) (c)
- nonbacktracking subexpression(a)(?(?(1).) )(b) (c)
- nested condition on a capture group by number((?<n>a))(?(?(n).) )(b)(c)
- nested condition on a capture group by name(a)(?(?(?:.).) )(b) (c)
- nested condition with implicitly parenthesized regexThrows ArgumentException
at run time when the regex is parsed. This actually makes sense since this explicitly warns us of some potential regex error rather than playing funny tricks with captures as in the previous case.
Applies to:
(a)(?(?<n>.) )(b) (c)
, (a)(?(?'n'.) )(b) (c)
- named groups - exception message: "Alternation conditions do not capture and cannot be named"
(a)(?(?'-n' .) )(b) (c)
, (?<a>a)(?(?<a-n>.) )(b) (c)
- balancing groups - exception message: "Alternation conditions do not capture and cannot be named"
(a)(?(?# comment) )(b) (c)
- inline comment - exception message: "Alternation conditions cannot be comments"
Throws OutOfMemoryException
during pattern match.
This is clearly a bug, as of my belief.
Applies to:
(a)(?(?i) )(b) (c)
- inline options (not to be confused with group options)[Surprisingly] works as expected but this is rather too artificial example:
(a)(?(?(.).) )(b) (c)
- nested condition with explicitly parenthesized regex(a)(?((?=.)) )(b) (c)
(a)(?((?!z)) )(b) (c)
(a)(?((?<=.)) )(b) (c)
(a)(?((?<! )) )(b) (c)
(a)(?((?: )) )(b) (c)
(a)(?((?i:.)) )(b) (c)
(a)(?((?>.)) )(b) (c)
(a)(?((?(1).)) )(b) (c)
((?<n>a))(?((?(n).)) )(b)(c)
(a)(?((?(?:.).)) )(b) (c)
(a)(?((?<n>.)) )(b) (c)
(a)(?((?'n'.)) )(b) (c)
(a)(?((?'-n' .)) )(b) (c)
(?<a>a)(?((?<a-n>.)) )(b) (c)
(a)(?((?# comment)) )(b) (c)
(a)(?((?i)) )(b) (c)
(a)(?((?(.).)) )(b) (c)
a(?( ) )b (c)
,请注意(?( ) )
等同于(?(?= ) )
,而不是(?(?<! ) )
(参见 Expression 条件匹配)。 - Wiktor Stribiżewa(?((?<! )) )b (c)
。 - Dmitry Egorov