为什么在Java中'(int)(char)(byte)-2'的结果是65534?

71

我在一次职业技能测试中遇到了这个问题。给定以下代码示例:

public class Manager {
    public static void main (String args[]) {
        System.out.println((int) (char) (byte) -2);
    }
}

它的输出结果为65534。

这种行为仅适用于负值;0和正数产生相同的值,意味着在SOP中输入的值。此处的字节转换是无关紧要的;我已经尝试过不使用它。

所以我的问题是:这里到底发生了什么?


byte 强制转换不改变结果并不意味着它什么都没做... - Narmer
这里char转换在做所有的事情,我不知道byte转换在干什么...你能告诉我它在这里做什么吗? - mangoCar
3
尝试运行 System.out.println((int)(char)(byte)-130),看看结果是否只是65536-130。然后阅读 @Chris K 的答案并计算出结果! :) - Narmer
哦,而且重新运行时不要使用 byte 强制转换! - Narmer
@Narmer在这里,(byte)确实改变了结果,所以这是不同的情况。 - glglgl
显示剩余4条评论
4个回答

132

在你理解接下来发生的事情前,我们需要达成一些前提条件。只要理解以下要点,其余部分就是简单的推断:

  1. All primitive types within the JVM are represented as a sequence of bits. The int type is represented by 32 bits, the char and short types by 16 bits and the byte type is represented by 8 bits.

  2. All JVM numbers are signed, where the char type is the only unsigned "number". When a number is signed, the highest bit is used to represent the sign of this number. For this highest bit, 0 represents a non-negative number (positive or zero) and 1 represents a negative number. Also, with signed numbers, a negative value is inverted (technically known as two's complement notation) to the incrementation order of positive numbers. For example, a positive byte value is represented in bits as follows:

    00 00 00 00 => (byte) 0
    00 00 00 01 => (byte) 1
    00 00 00 10 => (byte) 2
    ...
    01 11 11 11 => (byte) Byte.MAX_VALUE
    

    while the bit order for negative numbers is inverted:

    11 11 11 11 => (byte) -1
    11 11 11 10 => (byte) -2
    11 11 11 01 => (byte) -3
    ...
    10 00 00 00 => (byte) Byte.MIN_VALUE
    

    This inverted notation also explains why the negative range can host an additional number compared to the positive range where the latter includes the representation of the number 0. Remember, all this is only a matter of interpreting a bit pattern. You can note negative numbers differently, but this inverted notation for negative numbers is quite handy because it allows for some rather fast transformations as we will be able to see in a small example later on.

    As mentioned, this does not apply for the char type. The char type represents a Unicode character with a non-negative "numeric range" of 0 to 65535. Each of this number refers to a 16-bits Unicode value.

  3. When converting between the int, byte, short, char and boolean types the JVM needs to either add or truncate bits.

    If the target type is represented by more bits than the type from which it is converted, then the JVM simply fills the additional slots with the value of the highest bit of the given value (which represents the signature):

    |     short   |     byte    |
    |             | 00 00 00 01 | => (byte) 1
    | 00 00 00 00 | 00 00 00 01 | => (short) 1
    

    Thanks to the inverted notation, this strategy also works for negative numbers:

    |     short   |     byte    |
    |             | 11 11 11 11 | => (byte) -1
    | 11 11 11 11 | 11 11 11 11 | => (short) -1
    

    This way, the value's sign is retained. Without going into details of implementing this for a JVM, note that this model allows for a casting being performed by a cheap shift operation what is obviously advantageous.

    An exception from this rule is widening a char type which is, as we said before, unsigned. A conversion from a char is always applied by filling the additional bits with 0 because we said there is no sign and thus no need for an inverted notation. A conversion of a char to an int is therefore performed as:

    |            int            |    char     |     byte    |
    |                           | 11 11 11 11 | 11 11 11 11 | => (char) \uFFFF
    | 00 00 00 00 | 00 00 00 00 | 11 11 11 11 | 11 11 11 11 | => (int) 65535
    

    When the original type has more bits than the target type, the additional bits are merely cut off. As long as the original value would have fit into the target value, this works fine, as for example for the following conversion of a short to a byte:

    |     short   |     byte    |
    | 00 00 00 00 | 00 00 00 01 | => (short) 1
    |             | 00 00 00 01 | => (byte) 1
    | 11 11 11 11 | 11 11 11 11 | => (short) -1
    |             | 11 11 11 11 | => (byte) -1
    

    However, if the value is too big or too small, this does not longer work:

    |     short   |     byte    |
    | 00 00 00 01 | 00 00 00 01 | => (short) 257
    |             | 00 00 00 01 | => (byte) 1
    | 11 11 11 11 | 00 00 00 00 | => (short) -32512
    |             | 00 00 00 00 | => (byte) 0
    

    This is why narrowing castings sometimes lead to strange results. You might wonder why narrowing is implemented this way. You could argue that it would be more intuitive if the JVM checked a number's range and would rather cast an incompatible number to the biggest representable value of the same sign. However, this would require branching what is a costly operation. This is specifically important, as this two's complement notation allows for cheap arithmetic operations.

通过这些信息,我们可以看到在你的例子中数字-2发生了什么:

|           int           |    char     |     byte    |
| 11 11 11 11 11 11 11 11 | 11 11 11 11 | 11 11 11 10 | => (int) -2
|                         |             | 11 11 11 10 | => (byte) -2
|                         | 11 11 11 11 | 11 11 11 10 | => (char) \uFFFE
| 00 00 00 00 00 00 00 00 | 11 11 11 11 | 11 11 11 10 | => (int) 65534

如您所见,将 byte 转换为 char 会截取相同的位,因此 byte 转换是多余的。

如果您更喜欢正式的定义,所有这些规则也在 JVMS 中有明确说明。

最后要提醒的是:类型的位大小并不一定代表 JVM 在内存中表示该类型所保留的位数。事实上,JVM 不区分 booleanbyteshortcharint 类型。它们都由相同的 JVM 类型表示,其中虚拟机仅模拟这些转换。在方法的操作数栈(即方法内的任何变量)上,所有命名类型的值都消耗 32 位。然而,对于数组和对象字段,任何 JVM 实现者都可以自行处理。


4
你可以使用链接到二进制补码(也可参见SO)。在我看来,最大的优点是你可以通过加法来执行减法(a - b = a + (-b))。加法的操作方式与无符号整数完全相同。 - Palec
1
在最后一张表格中,你应该没有写成(char) 0x65534,而是写成(char) 65534或者(char) 0xFFFE - FrankPl
1
这行代码可能有错误:00 00 00 00 | => (byte) -1 - Ben Voigt
一个很好的关于类型转换如何工作的总结。在这个廉价内存的时代,人们忘记了类型大小的真正含义。 - Michael Shopsin
Java中的字符在规范中被定义为UTF16值。(https://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.1) 请更具体地说明您的声明。 - Rafael Winterhalter
显示剩余5条评论

35

这里有两个重要的注意事项:

  1. char 是无符号的,不能是负数。
  2. 将一个字节转换为 char 首先会涉及到一个隐藏的 int 强制转换,根据Java 语言规范

因此,将 -2 转换为 int 得到的是 11111111111111111111111111111110。请注意如何使用一来扩展上述二进制数的值;这只发生在负数的情况下。当我们将其缩小为 char 时,int 就被截断为

1111111111111110

最终,将1111111111111110转换为整数时,会进行零扩展而不是一扩展,因为现在该值被认为是正数(因为 char 类型只能是正数)。因此扩展位保留了该值不变,但与负值情况不同的是,它的值没有改变。当以十进制打印该二进制值时,结果是65534。


为什么将一个8位的byte强制转换为16位的char会产生一个-2的16位二进制补码,最终解析为65534的int?这是否与二进制补码有关?我的意思是,char类型中1的填充是如何进行的? - Narmer
2
谢谢@Narmer,你的观点非常好。我已经更新了答案,并引用了Java语言规范,解释了byte到char的转换是如何进行的。它通过int进行转换。 - Chris K
是的,你的回答是最具信息量和解释性的,它应该是这个问题的答案。 - Narmer
在这种情况下,所有数字都会发生符号扩展。恰好当您拥有正数时,符号位为0。负数没有特殊规则。 - indiv
@indiv,我已经调整了答案,以使零和一的位扩展更清晰。 - Chris K

30

char类型的取值范围是0到65535,如果你将一个负数强制转换为char类型,结果就相当于用这个数减去65536,从而得到65534。如果你将其打印成char类型,它会尝试显示由65534表示的任何Unicode字符,但当你将其强制转换为int类型时,实际上得到的是65534。如果你从一个大于65536的数字开始,你会看到类似“令人困惑”的结果,例如一个大数字(比如65538)最终会变小(变成2)。


一个 char 的取值范围难道不是 0-65535 吗? - JamesB
你是对的 - 已更改。减法是从总范围中进行的,该范围为65536,但这意味着高端是65535。 - Jacob Mattison

6
我认为最简单的解释方法就是将其分解为您正在执行的操作顺序。
Instance | #          int            |     char    | #   byte    |    result   |
Source   | 11 11 11 11 | 11 11 11 11 | 11 11 11 11 | 11 11 11 10 | -2          |
byte     |(11 11 11 11)|(11 11 11 11)|(11 11 11 11)| 11 11 11 10 | -2          |
int      | 11 11 11 11 | 11 11 11 11 | 11 11 11 11 | 11 11 11 10 | -2          |
char     |(00 00 00 00)|(00 00 00 00)| 11 11 11 11 | 11 11 11 10 | 65534       |
int      | 00 00 00 00 | 00 00 00 00 | 11 11 11 11 | 11 11 11 10 | 65534       |
  1. 您只是获取了一个32位有符号值。
  2. 然后将它转换为8位有符号值。
  3. 当您尝试将其转换为16位无符号值时,编译器会暗中进行快速转换为32位有符号值,
  4. 然后在不保持符号的情况下将其转换为16位。
  5. 当最终转换为32位时,没有符号,所以该值添加零位以保持值。

因此,是的,从这个角度来看,字节强制转换是重要的(学术上说),尽管结果是微不足道的(对于编程而言,一项重要的操作可能具有微不足道的效果)。缩小和扩展同时维持符号的影响。其中,转换为char会缩小,但不会扩大为符号。

(请注意,我使用#表示有符号位,如注释所述,char没有符号位,因为它是一个无符号值)。

我用括号表示实际内部发生的事情。数据类型实际上是在它们的逻辑块中被截断的,但如果视为int,则其结果将是括号所表示的结果。

带符号值始终随符号位的值扩大,而无符号值始终随位关闭而扩大。

因此,这个技巧(或陷阱)的关键在于,从byte到int的扩展在扩大时保持有符号值。 但是一旦触及char,它就缩小了,这就关闭了符号位。

如果没有进行到int的转换,则该值将为254。但是,它确实发生了转换,因此不是这种情况。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接