All JVM numbers are signed, where the char
type is the only unsigned "number". When a number is signed, the highest bit is used to represent the sign of this number. For this highest bit, 0
represents a non-negative number (positive or zero) and 1
represents a negative number. Also, with signed numbers, a negative value is inverted (technically known as two's complement notation) to the incrementation order of positive numbers. For example, a positive byte
value is represented in bits as follows:
00 00 00 00 => (byte) 0
00 00 00 01 => (byte) 1
00 00 00 10 => (byte) 2
...
01 11 11 11 => (byte) Byte.MAX_VALUE
while the bit order for negative numbers is inverted:
11 11 11 11 => (byte) -1
11 11 11 10 => (byte) -2
11 11 11 01 => (byte) -3
...
10 00 00 00 => (byte) Byte.MIN_VALUE
This inverted notation also explains why the negative range can host an additional number compared to the positive range where the latter includes the representation of the number 0
. Remember, all this is only a matter of interpreting a bit pattern. You can note negative numbers differently, but this inverted notation for negative numbers is quite handy because it allows for some rather fast transformations as we will be able to see in a small example later on.
As mentioned, this does not apply for the char
type. The char
type represents a Unicode character with a non-negative "numeric range" of 0
to 65535
. Each of this number refers to a 16-bits Unicode value.
When converting between the int
, byte
, short
, char
and boolean
types the JVM needs to either add or truncate bits.
If the target type is represented by more bits than the type from which it is converted, then the JVM simply fills the additional slots with the value of the highest bit of the given value (which represents the signature):
| short | byte |
| | 00 00 00 01 | => (byte) 1
| 00 00 00 00 | 00 00 00 01 | => (short) 1
Thanks to the inverted notation, this strategy also works for negative numbers:
| short | byte |
| | 11 11 11 11 | => (byte) -1
| 11 11 11 11 | 11 11 11 11 | => (short) -1
This way, the value's sign is retained. Without going into details of implementing this for a JVM, note that this model allows for a casting being performed by a cheap shift operation what is obviously advantageous.
An exception from this rule is widening a char
type which is, as we said before, unsigned. A conversion from a char
is always applied by filling the additional bits with 0
because we said there is no sign and thus no need for an inverted notation. A conversion of a char
to an int
is therefore performed as:
| int | char | byte |
| | 11 11 11 11 | 11 11 11 11 | => (char) \uFFFF
| 00 00 00 00 | 00 00 00 00 | 11 11 11 11 | 11 11 11 11 | => (int) 65535
When the original type has more bits than the target type, the additional bits are merely cut off. As long as the original value would have fit into the target value, this works fine, as for example for the following conversion of a short
to a byte
:
| short | byte |
| 00 00 00 00 | 00 00 00 01 | => (short) 1
| | 00 00 00 01 | => (byte) 1
| 11 11 11 11 | 11 11 11 11 | => (short) -1
| | 11 11 11 11 | => (byte) -1
However, if the value is too big or too small, this does not longer work:
| short | byte |
| 00 00 00 01 | 00 00 00 01 | => (short) 257
| | 00 00 00 01 | => (byte) 1
| 11 11 11 11 | 00 00 00 00 | => (short) -32512
| | 00 00 00 00 | => (byte) 0
This is why narrowing castings sometimes lead to strange results. You might wonder why narrowing is implemented this way. You could argue that it would be more intuitive if the JVM checked a number's range and would rather cast an incompatible number to the biggest representable value of the same sign. However, this would require branching what is a costly operation. This is specifically important, as this two's complement notation allows for cheap arithmetic operations.
byte
强制转换不改变结果并不意味着它什么都没做... - NarmerSystem.out.println((int)(char)(byte)-130)
,看看结果是否只是65536-130。然后阅读 @Chris K 的答案并计算出结果! :) - Narmerbyte
强制转换! - Narmer(byte)
确实改变了结果,所以这是不同的情况。 - glglgl