我一直在尝试在我的Java应用程序中检索“unicode用户输入”以获得一个小型实用程序片段。问题是,在Ubuntu上似乎“开箱即用”,这个操作系统似乎使用UTF-8进行编码,但当从Windows的“cmd”运行时却无法正常工作。需要考虑的代码如下:
public class SerTest {
public static void main(String[] args) throws Exception {
testUnicode();
}
public static void testUnicode() throws Exception {
System.out.println("Default charset: " +
Charset.defaultCharset().name());
BufferedReader in =
new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
System.out.printf("Enter 'абвгд эюя': ");
String line = in.readLine();
String s = "абвгд эюя";
byte[] sBytes = s.getBytes();
System.out.println("strg bytes: " + Arrays.toString(sBytes));
byte[] lineBytes = line.getBytes();
System.out.println("line bytes: " + Arrays.toString(lineBytes));
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.print("--->" + s + "<----\n");
out.print("--->" + line + "<----\n");
}
}
在Ubuntu上的输出(无需更改配置):
me@host> javac SerTest.java && java SerTest
Default charset: UTF-8
Enter 'абвгд эюя': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
line bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
--->абвгд эюя<----
--->абвгд эюя<----
在Windows CMD提示符上的输出(不受JAVA_TOOL_OPTIONS影响):
E:\>chcp 65001
Active code page: 65001
E:\>java -Dfile.encoding=utf8 SerTest
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8
Default charset: UTF-8
Enter 'абвгд эюя': юя': ': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
Exception in thread "main" java.lang.NullPointerException
at SerTest.testUnicode(SerTest.java:26) # byte[] lineBytes = line.getBytes();
at SerTest.main(SerTest.java:15)
在使用JAVA_TOOL_OPTIONS后,在Eclipse控制台中的输出:
Default charset: UTF-8
Enter 'абвгд эюя': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8
line bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
--->абвгд эюя<----
--->абвгд эюя<----
在Eclipse控制台中,它可以工作,因为我添加了一个系统范围的环境变量(JAVA_TOOL_OPTIONS),如果可能的话,我想避免使用该变量。
Eclipse控制台输出(删除JAVA_TOOL_OPTIONS之后):
Default charset: UTF-8
Enter 'абвгд эюя': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
line bytes: [-61, -112, -62, -80, -61, -112, -62, -79, -61, -112, -62, -78, -61, -112, -62, -77, -61, -112, -62, -76, 32, -61, -111, -17, -65, -67, -61, -111, -59, -67, -61, -111, -17, -65, -67]
--->абвгд эюя<----
--->абвгд �ю�<----
所以我的问题是:这里到底发生了什么?为了确保此代码片段适用于各种“Unicode”输入,需要进行哪些代码更改?
抱歉问题有点冗长,提前感谢您的帮助。
Sasuke