尝试理解JavaScript正则表达式结果

Question

尝试理解JavaScript正则表达式结果

3

我希望使用JavaScript解析字符串，有两种可选格式：

id#state#{font name, font size, "text"}  
// e.g. button1#hover#{arial.ttf, 20, "Ok"}

或者

id#state#text                            
// e.g. button1#hover#Ok

在第二个变体中，假定了默认的字体和大小。

在您进一步阅读之前，我必须指出我控制格式，因此我很想听听其他更适合 RegExp Friendly™ 的格式。话虽如此，由于历史原因，第二种选择是必需的，而 id#state#-部分也是如此。换句话说，灵活性存在于{字体名称，字体大小，“文本”}部分。

此外，我想尽可能使用正则表达式。是的，我建议下面的 RegExp 很复杂，但对于我的情况来说，这不仅是问题的可能解决方案，也是学习更多有关 RegExp 本身的问题。

目前，我尝试将两种格式中的三个或五个信息元素分组，方法如下。

var pat = /^(\w*)#(\w*)#
          (?:(?:\{([\w\.]*),\s*([0-9\.]*),\s*"([\w\s]*)"\})|([\w\s]*))$/;

var source1 = "button1#hover#{arial.ttf, 20, \"Ok\"}";
var source2 = "button1#hover#Ok";

var result1 = source1.match ( pat );
var result2 = source2.match ( pat );

alert ( "Source1: " + result1.length + " Source2: " + result2.length );

当我在http://www.regular-expressions.info/javascriptexample.html测试这个表达式时，得到的结果是：

result1 = [ button1#hover#{arial.ttf, 20, "Ok"}, button1, hover, arial.ttf, 
            20, Ok, undefined ]

并且

result2 = [ button1#hover#Ok, button1, hover, undefined, 
            undefined, undefined, Ok ]

以下是我对正则表达式的分解：

这里是我如何分解RegExp：

^(\w*)#(\w*)#(?:(?:\{([\w\.]*),\s*([0-9\.]*),\s*"([\w\s]*)"\})|([\w\s]*))$

^                 # anchor to beginning of string
(\w*)             # capture required id
#                 # match hash sign separator
(\w*)             # capture required state
#                 # match hash sign separator
                  # capture text structure with optional part:
(?:(?:\{([\w\.]*),\s*([0-9\.]*),\s*"([\w\s]*)"\})|([\w\s]*))  
$                 # anchor to end of string

我认为，文本结构的捕获是最棘手的部分。我将其分解如下：

(?:                  # match all of what follows but don't capture
    (?:\{            # match left curly bracket but don't capture (non-capturing group)
          ([\w\.]*)  # capture font name (with possible punctuation in font file name)
          ,\s*       # match comma and zero or more whitespaces
          ([0-9\.]*) # capture font size (with possible decimal part)
          ,\s*"      # match comma, zero or more whitespaces, and a quotation char
          ([\w\s]*)  # capture text including whitespaces
    "\})             # match quotation char and right curly bracket (and close non-capturing group)
    |                # alternation operator
    ([\w\s]*)        # capture optional group to match the second format variant
)                    # close outer non-capturing group

我的问题有两个方面：

1）在result1的情况下，我该如何避免尾随的未定义匹配？

2）在result2的情况下，我该如何避免三个未定义的匹配？

奖励问题：

我的分解对吗？（我猜测有些地方不对，因为RegExp并没有完全按照预期工作。）

谢谢！ :)

- conciliator

2

我不想抱怨这个，但只过了2分钟就有人踩了？拜托... :) - conciliator

1

我看不出为什么要投反对票，投票者没有留下任何评论。这对我来说似乎是一个合理的问题。 - Peter Wooster

谢谢@PeterWooster，很安心知道还有其他人认为这是一个合理的问题。 - conciliator

1

不用谢，我点赞了你的回答以帮你摆脱负分，现在有更擅长正则表达式的人提供了答案，否则这个问题可能会一直无人回答。关于要求评论才能进行负分投票的讨论已经在元社区中进行了很多。 - Peter Wooster

再次感谢彼得！我非常感激。 :) 并且，毫不奇怪地，我强烈支持对于负评必须添加评论的规定。 - conciliator

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Pointy · Accepted Answer

您的正则表达式中的组是从左到右编号的，而不考虑运算符（特别是|运算符）。当您有(x)|(y)时，x或y的组将为undefined。

因此，您无法避免结果中的空槽。实际上，我认为您需要它们，否则您就不知道匹配了哪种形式的输入。