在原问题“如何在Java中检查字节数组是否包含Unicode字符串?”中,我发现Java Unicode实际上是指Utf16 Code Units。我自己解决了这个问题,并创建了一些代码,可以帮助任何有这种问题的人找到答案。
我创建了两个主要方法,一个将显示Utf-8 Code Units,另一个将创建Utf-16 Code Units。Java和JavaScript中会遇到Utf-16 Code Units……通常以"\ud83d"的形式出现。
如需更多关于Code Units和转换的帮助,请尝试访问该网站;
https://r12a.github.io/apps/conversion/
这里是代码...
byte[] array_bytes = text.toString().getBytes();
char[] array_chars = text.toString().toCharArray();
System.out.println();
byteArrayToUtf8CodeUnits(array_bytes);
System.out.println();
charArrayToUtf16CodeUnits(array_chars);
public static void byteArrayToUtf8CodeUnits(byte[] byte_array)
{
System.out.println("array.length: = " + byte_array.length);
for (int k = 0; k < byte_array.length; k++)
{
System.out.println("array byte: " + "[" + k + "]" + " converted to hex" + " = " + byteToHex(byte_array[k]));
}
}
public static void charArrayToUtf16CodeUnits(char[] char_array)
{
System.out.println("array.length: = " + char_array.length);
for (int i = 0; i < char_array.length; i++)
{
System.out.println("array char: " + "[" + i + "]" + " converted to hex" + " = " + charToHex(char_array[i]));
}
}
static public String byteToHex(byte b)
{
char hexDigit[] =
{
'0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
};
char[] array = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
return new String(array);
}
static public String charToHex(char c)
{
byte hi = (byte) (c >>> 8);
byte lo = (byte) (c & 0xff);
return byteToHex(hi) + byteToHex(lo);
}