使用.NET正则表达式解析引号之间的文本

Question

使用.NET正则表达式解析引号之间的文本

5

我有以下输入文本：

@"This is some text @foo=bar @name=""John \""The Anonymous One\"" Doe"" @age=38"

我希望解析@name=value语法的值，作为名称/值对。解析前面的字符串应该会得到以下命名捕获：

name:"foo"
value:"bar"

name:"name"
value:"John \""The Anonymous One\"" Doe"

name:"age"
value:"38"

我尝试了以下正则表达式，几乎达到了目的：

@"(?:(?<=\s)|^)@(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\\)""))"

主要问题是它在"John \""The Anonymous One\"" Doe"中捕获了开头的引号。我觉得这应该是一个回顾而不是前瞻，但那似乎根本不起作用。

以下是表达式的一些规则：

- 名称必须以字母开头，可以包含任何字母、数字、下划线或连字符。 - 未引用的值必须至少有一个字符，可以包含任何字母、数字、下划线或连字符。 - 引用的值可以包含任何字符，包括任何空格和转义引号。

编辑：

这里是regex101.com的结果：

(?:(?<=\s)|^)@(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)"))

(?:(?<=\s)|^) Non-capturing group
@ matches the character @ literally
(?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name
\s* match any white space character [\r\n\t\f ]
= matches the character = literally
\s* match any white space character [\r\n\t\f ]
    Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) Named capturing group value
    1st Alternative: [A-Za-z0-9_-]+
        [A-Za-z0-9_-]+ match a single character present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            A-Z a single character in the range between A and Z (case sensitive)
            a-z a single character in the range between a and z (case sensitive)
            0-9 a single character in the range between 0 and 9
            _- a single character in the list _- literally
    2nd Alternative: (?=").+?(?=(?<!\\)")
        (?=") Positive Lookahead - Assert that the regex below can be matched
            " matches the characters " literally
        .+? matches any character (except newline)
            Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
        (?=(?<!\\)") Positive Lookahead - Assert that the regex below can be matched
            (?<!\\) Negative Lookbehind - Assert that it is impossible to match the regex below
                \\ matches the character \ literally
            " matches the characters " literally

- Anthony Grescavage

1

你考虑过使用 JSON 吗？ - yazanpro

您IP地址为143.198.54.68，由于运营成本限制，当前对于免费用户的使用频率限制为每个IP每72小时10次对话，如需解除限制，请点击左下角设置图标按钮（手机用户先点击左上角菜单按钮）。 - Alexei Levenkov

JSON不是一个选项。这不是针对SQL或任何已有解析器的现有技术。这是一个非常特定的用例。 - Anthony Grescavage

2个回答

0

使用字符串方法。分割。

string myLongString = ""@"This is some text @foo=bar @name=""John \""The Anonymous One\"" Doe"" @age=38"

string[] nameValues = myLongString.Split('@');

从那里使用 Split 函数和 "=" 进行分割，或使用 IndexOf("=")。

- Mukus

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Wiktor Stribiżew · Accepted Answer

你可以使用一个非常实用的.NET正则表达式功能，允许多个同名捕获。另外，你的(?<name>)捕获组存在问题：它允许数字出现在第一位，这不符合你的第一个要求。

所以，我建议：

(?si)(?:(?<=\s)|^)@(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\\)""))

查看 demo

请注意，您无法在regex101.com上调试特定于.NET的正则表达式，您需要在符合.NET的环境中测试它们。