我在努力让Eclipse正确读取中文字符,但我不确定哪里出了问题。
具体地说,在从控制台读取简体或繁体中文字符串以及输出它之间的某个地方,它变成了乱码。即使输出一个大量混合文本(英文/中文字符)的字符串,它似乎也只会改变中文字符的外观。
我把它缩减到以下测试示例,并明确注释了每个阶段我认为发生的情况 - 请注意,我是一名学生,非常希望确认自己的理解是否正确 :)
public static void main(String[] args) {
try
{
boolean isRunning = true;
//Raw flow of input data from the console
InputStream inputStream = System.in;
//Allows you to read the stream, using either the default character encoding, else the specified encoding;
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
//Adds functionality for converting the stream being read in, into Strings(?)
BufferedReader input_BufferedReader = new BufferedReader(inputStreamReader);
//Raw flow of outputdata to the console
OutputStream outputStream = System.out;
//Write a stream, from a given bit of text
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
//Adds functionality to the base ability to write to a stream
BufferedWriter output_BufferedWriter = new BufferedWriter(outputStreamWriter);
while(isRunning) {
System.out.println();//force extra newline
System.out.print("> ");
//To read in a line of text (as a String):
String userInput_asString = input_BufferedReader.readLine();
//To output a line of text:
String outputToUser_fromString_englishFromCode = "foo"; //outputs correctly
output_BufferedWriter.write(outputToUser_fromString_englishFromCode);
output_BufferedWriter.flush();
System.out.println();//force extra newline
String outputToUser_fromString_ChineseFromCode = "之謂甚"; //outputs correctly
output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode);
output_BufferedWriter.flush();
System.out.println();//force extra newline
String outputToUser_fromString_userSupplied = userInput_asString; //outputs correctly when given English text, garbled when given Chinese text
output_BufferedWriter.write(outputToUser_fromString_userSupplied);
output_BufferedWriter.flush();
System.out.println();//force extra newline
}
}
catch (Exception e) {
// TODO: handle exception
}
}
样例输出:
> 之謂甚
foo
之謂甚
之謂甚
> oaea
foo
之謂甚
oaea
> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂甚;
>
在这个Stack Overflow帖子中看到的内容与我在Eclipse控制台和查看/编辑变量值时看到的内容完全一致。通过Eclipse调试器手动更改变量值会导致依赖该值的代码表现出我通常期望的行为,这表明是如何读取文本中的问题。
我尝试使用许多不同的扫描仪/缓冲流[reader | writer]等以进行读入和输出,包括显式字符类型和未显式字符类型,但这并没有特别系统地完成,可能很容易错过某些事情。
我已经尝试将Eclipse环境设置为在可能的任何地方都使用UTF-8,但我想我可能错过了一两个地方。请注意,控制台将正确输出硬编码的汉字。
十分感谢您的任何协助/指导! :)
PrintStream
,它按字节工作。您需要将其包装在PrintWriter
或OutputStreamWriter
中以字符形式输出,这就是为什么userInput输出不正确的原因。 - Powerlord