我正在尝试将UTF8字符串编码和解码为base64。
理论上不是问题,但是在解码时似乎无法输出正确的字符,而是输出了问号(?)。
String original = "خهعسيبنتا";
B64encoder benco = new B64encoder();
String enc = benco.encode(original);
try
{
String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Original: " + original);
prtHx("ara", original.getBytes());
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes());
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes());
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
控制台的输出如下:
原始数据:خهعسيبنتا
ara = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
编码后数据:Pz8/Pz8/Pz8/
enc = 50, 7A, 38, 2F, 50, 7A, 38, 2F, 50, 7A, 38, 2F
解码后数据:?????????
dec = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F
prtHx函数将字节的十六进制值写入输出。请问我是否存在明显的错误?
Andreas指出了正确的解决方案,他强调getBytes()方法使用平台默认编码(Cp1252),即使源文件本身是UTF-8编码。通过使用getBytes("UTF-8"),我能够发现编码和解码后的字节实际上是不同的。进一步的调查显示,encode方法也使用了getBytes()方法,更改这一点就可以很好地解决问题。
try
{
String enc = benco.encode(original);
String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Original: " + original);
prtHx("ori", original.getBytes("UTF-8"));
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes("UTF-8"));
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes("UTF-8"));
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
系统编码Cp1252
原文: خهعسيبنتا
ori = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7
编码后: 2K7Zh9i52LPZitio2YbYqtin
enc = 32, 4B, 37, 5A, 68, 39, 69, 35, 32, 4C, 50, 5A, 69, 74, 69, 6F, 32, 59, 62, 59, 71, 74, 69, 6E
解码后: خهعسيبنتا
dec = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7
谢谢。