使用条件回溯和捕获组的.NET正则表达式

6
2个回答

4

我不确定这种行为是否有文档记录(如果有的话,我没有找到),但使用包含显式零宽断言的条件结构作为其表达式(?(?=expression)yes|no)会覆盖紧接着的下一个带编号的捕获组(将其清空)。您可以通过运行以下正则表达式进行确认:

a(?(?<! ) )b (c)()

解决此问题的四种方法:

  1. Enclosing expression in parentheses noted by @DmitryEgorov (that also keeps second capturing group intact) and is not included in result - the right way:

    a(?((?<! )) )b (c)
    
  2. As this behavior is only applied to unnamed capturing groups (default) you can get expected result using a named capturing group:

    a(?(?<! ) )b (?<first>c)
    
  3. Adding an extra capturing group where ever you like between (c) and conditional:

    a(?(?<! ) )(b) (c)
    
  4. Avoiding such an expression if possible. E.g:

    a(?( ) )b (c)
    

关于 a(?( ) )b (c),请注意 (?( ) ) 等同于 (?(?= ) ),而不是 (?(?<! ) )(参见 Expression 条件匹配)。 - Wiktor Stribiżew
是的,已添加适当的短语。@WiktorStribiżew - revo
2
解决此问题的另一种方法是将条件括在额外的捕获组中:a(?((?<! )) )b (c) - Dmitry Egorov
我认为在.NET正则表达式中进行条件测试的正确语法是这样的。我会添加它。@DmitryEgorov - revo
1
太有趣了!看起来问题只发生在括号承担双重职责时:既包含条件表达式,又构成表达式中的一部分组结构。这对我来说看起来像是一个错误。 - Alan Moore

2
除了@revo的答案之外:
实际上,几乎所有条件构造都受到影响,其中条件表达式是带括号的正则表达式(分组、条件、其他特殊情况),而没有额外的括号。
在这种情况下,有四种类型的(不良)行为:
  1. Capture group array gets mangled (as pointed out by the OP), namely the capture group immediately following the conditional construct is lost the other groups are shifted left leaving the last capture group undefined.

    In the following examples the expected capture allocation is

    $1="a", $2="b", $3="c"
    

    while the actual result is

    $1="a", $2="c", $3="" (the latter is empty string)
    

    Applies to:

  2. Throws ArgumentException at run time when the regex is parsed. This actually makes sense since this explicitly warns us of some potential regex error rather than playing funny tricks with captures as in the previous case.

    Applies to:

    • (a)(?(?<n>.) )(b) (c), (a)(?(?'n'.) )(b) (c) - named groups - exception message: "Alternation conditions do not capture and cannot be named"
    • (a)(?(?'-n' .) )(b) (c), (?<a>a)(?(?<a-n>.) )(b) (c) - balancing groups - exception message: "Alternation conditions do not capture and cannot be named"
    • (a)(?(?# comment) )(b) (c) - inline comment - exception message: "Alternation conditions cannot be comments"
  3. Throws OutOfMemoryException during pattern match. This is clearly a bug, as of my belief.

    Applies to:

    • (a)(?(?i) )(b) (c) - inline options (not to be confused with group options)
  4. [Surprisingly] works as expected but this is rather too artificial example:

所有这些正则表达式都可以通过将条件表达式括在显式的括号中来修复(即,如果表达式本身已经包含括号,则需要添加额外的括号)。 以下是修复后的版本(按出现顺序排列):
(a)(?((?=.)) )(b) (c)
(a)(?((?!z)) )(b) (c)
(a)(?((?<=.)) )(b) (c)
(a)(?((?<! )) )(b) (c)
(a)(?((?: )) )(b) (c)
(a)(?((?i:.)) )(b) (c)
(a)(?((?>.)) )(b) (c)
(a)(?((?(1).)) )(b) (c)
((?<n>a))(?((?(n).)) )(b)(c)
(a)(?((?(?:.).)) )(b) (c)
(a)(?((?<n>.)) )(b) (c)
(a)(?((?'n'.)) )(b) (c)
(a)(?((?'-n' .)) )(b) (c)
(?<a>a)(?((?<a-n>.)) )(b) (c)
(a)(?((?# comment)) )(b) (c)
(a)(?((?i)) )(b) (c)
(a)(?((?(.).)) )(b) (c)

检查所有这些表达式的示例代码:https://ideone.com/KHbqMI

好的探索。 - revo

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接