这取决于JVM的实现,以及底层硬件。大多数现代硬件不会从内存中获取单个字节(甚至不会从第一级缓存中获取),即使用较小的原始类型通常不会降低内存带宽消耗。同样,现代CPU的字长为64位。它们可以在更少的位上执行操作,但这是通过丢弃额外位来实现的,也不会更快。
唯一的好处是较小的原始类型可以导致更紧凑的内存布局,尤其是在使用数组时。这可以节省内存,提高引用局部性(从而减少缓存未命中的数量)并减少垃圾回收开销。
然而,一般情况下,使用较小的原始类型并不更快。
为了证明这一点,请看下面的基准测试:
public class Benchmark {
public static void benchmark(String label, Code code) {
print(25, label);
try {
for (int iterations = 1; ; iterations *= 2) {
System.gc();
long previouslyUsedMemory = usedMemory();
long start = System.nanoTime();
code.execute(iterations);
long duration = System.nanoTime() - start;
long memoryUsed = usedMemory() - previouslyUsedMemory;
if (iterations > 1E8 || duration > 1E9) {
print(25, new BigDecimal(duration * 1000 / iterations).movePointLeft(3) + " ns / iteration");
print(30, new BigDecimal(memoryUsed * 1000 / iterations).movePointLeft(3) + " bytes / iteration\n");
return;
}
}
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
private static void print(int desiredLength, String message) {
System.out.print(" ".repeat(Math.max(1, desiredLength - message.length())) + message);
}
private static long usedMemory() {
return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
}
@FunctionalInterface
interface Code {
Object execute(int iterations);
}
public static void main(String[] args) {
benchmark("long[] traversal", (iterations) -> {
long[] array = new long[iterations];
for (int i = 0; i < iterations; i++) {
array[i] = i;
}
return array;
});
benchmark("int[] traversal", (iterations) -> {
int[] array = new int[iterations];
for (int i = 0; i < iterations; i++) {
array[i] = i;
}
return array;
});
benchmark("short[] traversal", (iterations) -> {
short[] array = new short[iterations];
for (int i = 0; i < iterations; i++) {
array[i] = (short) i;
}
return array;
});
benchmark("byte[] traversal", (iterations) -> {
byte[] array = new byte[iterations];
for (int i = 0; i < iterations; i++) {
array[i] = (byte) i;
}
return array;
});
benchmark("long fields", (iterations) -> {
class C {
long a = 1;
long b = 2;
}
C[] array = new C[iterations];
for (int i = 0; i < iterations; i++) {
array[i] = new C();
}
return array;
});
benchmark("int fields", (iterations) -> {
class C {
int a = 1;
int b = 2;
}
C[] array = new C[iterations];
for (int i = 0; i < iterations; i++) {
array[i] = new C();
}
return array;
});
benchmark("short fields", (iterations) -> {
class C {
short a = 1;
short b = 2;
}
C[] array = new C[iterations];
for (int i = 0; i < iterations; i++) {
array[i] = new C();
}
return array;
});
benchmark("byte fields", (iterations) -> {
class C {
byte a = 1;
byte b = 2;
}
C[] array = new C[iterations];
for (int i = 0; i < iterations; i++) {
array[i] = new C();
}
return array;
});
benchmark("long multiplication", (iterations) -> {
long result = 1;
for (int i = 0; i < iterations; i++) {
result *= 3;
}
return result;
});
benchmark("int multiplication", (iterations) -> {
int result = 1;
for (int i = 0; i < iterations; i++) {
result *= 3;
}
return result;
});
benchmark("short multiplication", (iterations) -> {
short result = 1;
for (int i = 0; i < iterations; i++) {
result *= 3;
}
return result;
});
benchmark("byte multiplication", (iterations) -> {
byte result = 1;
for (int i = 0; i < iterations; i++) {
result *= 3;
}
return result;
});
}
}
在我的Intel Core i7 CPU @ 3.5 GHz上使用OpenJDK 14运行,将会打印出以下内容:
long[] traversal 3.206 ns / iteration 8.007 bytes / iteration
int[] traversal 1.557 ns / iteration 4.007 bytes / iteration
short[] traversal 0.881 ns / iteration 2.007 bytes / iteration
byte[] traversal 0.584 ns / iteration 1.007 bytes / iteration
long fields 25.485 ns / iteration 36.359 bytes / iteration
int fields 23.126 ns / iteration 28.304 bytes / iteration
short fields 21.717 ns / iteration 20.296 bytes / iteration
byte fields 21.767 ns / iteration 20.273 bytes / iteration
long multiplication 0.538 ns / iteration 0.000 bytes / iteration
int multiplication 0.526 ns / iteration 0.000 bytes / iteration
short multiplication 0.786 ns / iteration 0.000 bytes / iteration
byte multiplication 0.784 ns / iteration 0.000 bytes / iteration
正如您所看到的,只有在遍历大型数组时才会出现显著的速度提升;使用较小的对象字段几乎没有效益,并且在小数据类型上计算实际上略慢。
总体而言,性能差异非常小。优化算法比原始类型的选择更加重要。
shorts[i] = (short)(bytes[i] & 0xFF)
比shorts[i] = bytes[i]
快了约10%。我根据你的建议改成了 int[],但是ints[i] = bytes[i] & 0xFF
仍然比ints[i] = bytes[i]
快了约12%。有什么想法吗?这是否与符号扩展有关,应该在x86上用单个替换指令MOVSX r32,r/m8
? - Mark Jeronimus