我刚刚读到了分支预测,并想尝试一下如何在Java 8流中使用它。
然而,使用流的性能始终比传统循环差。
int totalSize = 32768;
int filterValue = 1280;
int[] array = new int[totalSize];
Random rnd = new Random(0);
int loopCount = 10000;
for (int i = 0; i < totalSize; i++) {
// array[i] = rnd.nextInt() % 2560; // Unsorted Data
array[i] = i; // Sorted Data
}
long start = System.nanoTime();
long sum = 0;
for (int j = 0; j < loopCount; j++) {
for (int c = 0; c < totalSize; ++c) {
sum += array[c] >= filterValue ? array[c] : 0;
}
}
long total = System.nanoTime() - start;
System.out.printf("Conditional Operator Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9));
start = System.nanoTime();
sum = 0;
for (int j = 0; j < loopCount; j++) {
for (int c = 0; c < totalSize; ++c) {
if (array[c] >= filterValue) {
sum += array[c];
}
}
}
total = System.nanoTime() - start;
System.out.printf("Branch Statement Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9));
start = System.nanoTime();
sum = 0;
for (int j = 0; j < loopCount; j++) {
sum += Arrays.stream(array).filter(value -> value >= filterValue).sum();
}
total = System.nanoTime() - start;
System.out.printf("Streams Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9));
start = System.nanoTime();
sum = 0;
for (int j = 0; j < loopCount; j++) {
sum += Arrays.stream(array).parallel().filter(value -> value >= filterValue).sum();
}
total = System.nanoTime() - start;
System.out.printf("Parallel Streams Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9));
输出:
对于已排序的数组:
Conditional Operator Time : 294062652 ns, (0.294063 sec)
Branch Statement Time : 272992442 ns, (0.272992 sec)
Streams Time : 806579913 ns, (0.806580 sec)
Parallel Streams Time : 2316150852 ns, (2.316151 sec)
对于未排序的数组:
Conditional Operator Time : 367304250 ns, (0.367304 sec)
Branch Statement Time : 906073542 ns, (0.906074 sec)
Streams Time : 1268648265 ns, (1.268648 sec)
Parallel Streams Time : 2420482313 ns, (2.420482 sec)
我尝试使用List相同的代码:
list.stream()
代替Arrays.stream(array)
list.get(c)
代替array[c]
输出:
对于排序后的列表:
Conditional Operator Time : 860514446 ns, (0.860514 sec)
Branch Statement Time : 663458668 ns, (0.663459 sec)
Streams Time : 2085657481 ns, (2.085657 sec)
Parallel Streams Time : 5026680680 ns, (5.026681 sec)
对于未排序列表
Conditional Operator Time : 704120976 ns, (0.704121 sec)
Branch Statement Time : 1327838248 ns, (1.327838 sec)
Streams Time : 1857880764 ns, (1.857881 sec)
Parallel Streams Time : 2504468688 ns, (2.504469 sec)
我参考了几篇博客(此处)和(此处),这些都指出了关于流的性能问题。
- 我同意使用流进行编程在某些情况下更好以及更容易,但当我们牺牲性能时,为什么我们需要使用它们?我错过了什么吗?
- 只有在函数定义需要很长时间时,导致循环性能可以忽略不计的情况下,才会发现流与循环具有相同的性能。哪种情况下会发生这种情况?
- 在所有情况下,我都没有看到流利用分支预测的优势(我尝试了排序和无序流,但都没有用。相比普通流,它产生了超过两倍的性能影响)?