我正在尝试创建一个正则表达式,它接受美国键盘上几乎所有字符,但不包括一些特定的字符。这是我目前拥有的(并非全部包含):
^[a-zA-Z0-9!~`@#$%\\^]
现在我知道^
是我遇到的第一个需要在前面转义的字符。当我输入一个\
时,会出现编译错误(无效的转义序列)。当我对一个字符串运行此操作时,它完全忽略了^
规则。有人知道我做错了什么吗?
由于您正在使用字符类,因此无需转义^
,只需使用:
^[a-zA-Z0-9!~`@#$%^]
"a".matches("^[a-zA-Z0-9!~`@#$%^]");
"asdf".matches("^[a-zA-Z0-9!~`@#$%^]+"); // for multiple characters
"^[a-zA-Z0-9!~`@#$%\\^]"
在您的源代码中,这个正则表达式末尾的反斜杠并不重要。您需要键入2个反斜杠,因为在Java中反斜杠具有特殊含义,但这与其对待正则表达式无关。正则表达式引擎接收单个反斜杠,它用于将以下字符读取为文字,但^在括号内始终是一个文字。
关于 [ 和 ] 的解释:
方括号在正则表达式中有特殊含义,它们基本上形成了模式给定的字符列表(所述字符形成所谓的字符类)的边界。让我们分解上面的正则表达式以使事情更清楚。
^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\ Backslash. Regular expression engine only receives single backslash as the other backslash is consumed by Java's syntax for Strings. Would be used to mark following character as literal but ^ is a literal in character class definitions anyway so theses backslashes are ignored.
^ Caret, literally
] Closing boundary of your character class
字符类定义中模式的顺序是无关紧要的。如果被检查文本的第一个字符属于您的字符类定义,上述表达式将匹配成功。如果考虑被检查文本中的其他字符是否匹配,就取决于正则表达式的使用方式。
当您开始使用正则表达式时,应始终使用多个测试文本进行匹配并验证其行为。还建议将这些测试用例作为单元测试来获得对程序正确行为的高度信心。
用于测试表达式的简单代码示例如下:
public class Test {
public static void main(String[] args) {
String regexp = "^[ a-zA-Z0-9!~`@#$%\\\\^\\[\\]]+$";
String[] testdata = new String[] {
"abc",
"2332",
"some@test",
"test [ and ] test end",
// Following sample will not match the pattern.
"äöüßµøł"
};
for (String toExamine : testdata) {
if (toExamine.matches(regexp)) {
System.out.println("Match: " + toExamine);
} else {
System.out.println("No match: " + toExamine);
}
}
}
}
^ Matches the start of the text
[ Opening boundary of your character class
a-z Lower case letters of A to Z
A-Z Upper case letters of A to Z
0-9 Numbers from 0 to 9
! Exclamation mark, literally
~ Tilde, literally
` Backtick, literally
@ The @ character, literally
# Hash, literally
$ Dollar, literally
% Percent sign, literally
\\\\ Backslash, literally. Regular expression engine only receives 2 backslashes as every other backslash is consumed by Java's syntax for Strings. The first backslash is seen as marking the second backslash a occurring literally in the string.
^ Caret, literally
\\[ Opening bracket, literally. The backslash makes the bracket loose its meaning as opening a character class definition.
\\] Closing bracket, literally. The backslash makes the bracket loose its meaning as closing a character class definition.
] Closing boundary of your character class
+ Means any number of characters matching your character class definition can occur, but at least 1 such character needs to be present for a match
$ Matches the start of the text
然而,有一件事我不太明白,为什么要使用美式键盘的字符作为验证标准。