Java正则表达式特殊字符转义

Question

Java正则表达式特殊字符转义

4

我正在尝试创建一个正则表达式，它接受美国键盘上几乎所有字符，但不包括一些特定的字符。这是我目前拥有的（并非全部包含）：

^[a-zA-Z0-9!~`@#$%\\^]

现在我知道^是我遇到的第一个需要在前面转义的字符。当我输入一个\时，会出现编译错误（无效的转义序列）。当我对一个字符串运行此操作时，它完全忽略了^规则。有人知道我做错了什么吗？

- Josh Levine

2

你不需要转义 ^ - BoDidely

2个回答

1

你只需要在想匹配文本中包含字符“^”时对其进行转义，即可将其匹配为字面量。

如果你想使用“^”的特殊含义（表示行/字符串的开头），则无需对其进行转义。只需输入即可。

"^[a-zA-Z0-9!~`@#$%\\^]"

在您的源代码中，这个正则表达式末尾的反斜杠并不重要。您需要键入2个反斜杠，因为在Java中反斜杠具有特殊含义，但这与其对待正则表达式无关。正则表达式引擎接收单个反斜杠，它用于将以下字符读取为文字，但^在括号内始终是一个文字。

关于 [ 和 ] 的解释：

方括号在正则表达式中有特殊含义，它们基本上形成了模式给定的字符列表（所述字符形成所谓的字符类）的边界。让我们分解上面的正则表达式以使事情更清楚。

^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\ Backslash. Regular expression engine only receives single backslash as the other backslash is consumed by Java's syntax for Strings. Would be used to mark following character as literal but ^ is a literal in character class definitions anyway so theses backslashes are ignored.
^ Caret, literally
] Closing boundary of your character class

字符类定义中模式的顺序是无关紧要的。如果被检查文本的第一个字符属于您的字符类定义，上述表达式将匹配成功。如果考虑被检查文本中的其他字符是否匹配，就取决于正则表达式的使用方式。

当您开始使用正则表达式时，应始终使用多个测试文本进行匹配并验证其行为。还建议将这些测试用例作为单元测试来获得对程序正确行为的高度信心。

用于测试表达式的简单代码示例如下：

public class Test {
    public static void main(String[] args) {
        String regexp = "^[ a-zA-Z0-9!~`@#$%\\\\^\\[\\]]+$";
        String[] testdata = new String[] {
                "abc",
                "2332",
                "some@test",
                "test [ and ] test end",
                // Following sample will not match the pattern.
                "äöüßµøł"
        };
        for (String toExamine : testdata) {
            if (toExamine.matches(regexp)) {
                System.out.println("Match: " + toExamine);
            } else {
                System.out.println("No match: " + toExamine);
            }
        }
    }
}

请注意，我在这里使用了修改后的模式。它确保被检查字符串中的所有字符都与您的字符类匹配。我扩展了字符类以允许使用 \ 和空格和 [ 和 ]。分解的描述如下：

^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\\\ Backslash, literally. Regular expression engine only receives 2 backslashes as every other backslash is consumed by Java's syntax for Strings. The first backslash is seen as marking the second backslash a occurring literally in the string.
^ Caret, literally
\\[ Opening bracket, literally. The backslash makes the bracket loose its meaning as opening a character class definition.
\\] Closing bracket, literally. The backslash makes the bracket loose its meaning as closing a character class definition.
] Closing boundary of your character class
+ Means any number of characters matching your character class definition can occur, but at least 1 such character needs to be present for a match
$ Matches the start of the text

然而，有一件事我不太明白，为什么要使用美式键盘的字符作为验证标准。

- Augustus Kling

谢谢你的帮助。[]现在似乎给我带来了一些麻烦。 - Josh Levine

我添加了关于括号的描述，并修正了先前解释中的错误。 - Augustus Kling

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Federico Piazza · Accepted Answer

由于您正在使用字符类，因此无需转义^，只需使用：

^[a-zA-Z0-9!~`@#$%^]

[ 和 ] 之间使用的字符类允许您将所需字符和特殊字符放在方括号内，这些特殊字符不再特殊。唯一需要转义的情况是如果您正在使用快捷方式范围，例如 \d 或 \w，则需要转义，因为您在 Java 中使用反斜杠，所以需要将其转义为 \\d 或 \\w（但仅限于 Java，而不是正则表达式引擎）。

例如：

"a".matches("^[a-zA-Z0-9!~`@#$%^]");
"asdf".matches("^[a-zA-Z0-9!~`@#$%^]+"); // for multiple characters