我刚刚偶然发现了些事情。起初,我以为这可能是一个分支误预测的案例,就像这个案例一样,但是我不能解释为什么分支误预测会导致这种行为。
我在Java中实现了两个版本的Bubble Sort,并进行了一些性能测试:
import java.util.Random;
public class BubbleSortAnnomaly {
public static void main(String... args) {
final int ARRAY_SIZE = Integer.parseInt(args[0]);
final int LIMIT = Integer.parseInt(args[1]);
final int RUNS = Integer.parseInt(args[2]);
int[] a = new int[ARRAY_SIZE];
int[] b = new int[ARRAY_SIZE];
Random r = new Random();
for (int run = 0; RUNS > run; ++run) {
for (int i = 0; i < ARRAY_SIZE; i++) {
a[i] = r.nextInt(LIMIT);
b[i] = a[i];
}
System.out.print("Sorting with sortA: ");
long start = System.nanoTime();
int swaps = bubbleSortA(a);
System.out.println( (System.nanoTime() - start) + " ns. "
+ "It used " + swaps + " swaps.");
System.out.print("Sorting with sortB: ");
start = System.nanoTime();
swaps = bubbleSortB(b);
System.out.println( (System.nanoTime() - start) + " ns. "
+ "It used " + swaps + " swaps.");
}
}
public static int bubbleSortA(int[] a) {
int counter = 0;
for (int i = a.length - 1; i >= 0; --i) {
for (int j = 0; j < i; ++j) {
if (a[j] > a[j + 1]) {
swap(a, j, j + 1);
++counter;
}
}
}
return (counter);
}
public static int bubbleSortB(int[] a) {
int counter = 0;
for (int i = a.length - 1; i >= 0; --i) {
for (int j = 0; j < i; ++j) {
if (a[j] >= a[j + 1]) {
swap(a, j, j + 1);
++counter;
}
}
}
return (counter);
}
private static void swap(int[] a, int j, int i) {
int h = a[i];
a[i] = a[j];
a[j] = h;
}
}
正如我们所看到的,这两种排序方法之间唯一的区别在于>
和>=
。当使用java BubbleSortAnnomaly 50000 10 10
运行程序时,人们显然会预期sortB
比sortA
慢,因为它需要执行更多的swap(...)
。但是我在三台不同的机器上得到了以下(或类似的)输出:
Sorting with sortA: 4.214 seconds. It used 564960211 swaps.
Sorting with sortB: 2.278 seconds. It used 1249750569 swaps.
Sorting with sortA: 4.199 seconds. It used 563355818 swaps.
Sorting with sortB: 2.254 seconds. It used 1249750348 swaps.
Sorting with sortA: 4.189 seconds. It used 560825110 swaps.
Sorting with sortB: 2.264 seconds. It used 1249749572 swaps.
Sorting with sortA: 4.17 seconds. It used 561924561 swaps.
Sorting with sortB: 2.256 seconds. It used 1249749766 swaps.
Sorting with sortA: 4.198 seconds. It used 562613693 swaps.
Sorting with sortB: 2.266 seconds. It used 1249749880 swaps.
Sorting with sortA: 4.19 seconds. It used 561658723 swaps.
Sorting with sortB: 2.281 seconds. It used 1249751070 swaps.
Sorting with sortA: 4.193 seconds. It used 564986461 swaps.
Sorting with sortB: 2.266 seconds. It used 1249749681 swaps.
Sorting with sortA: 4.203 seconds. It used 562526980 swaps.
Sorting with sortB: 2.27 seconds. It used 1249749609 swaps.
Sorting with sortA: 4.176 seconds. It used 561070571 swaps.
Sorting with sortB: 2.241 seconds. It used 1249749831 swaps.
Sorting with sortA: 4.191 seconds. It used 559883210 swaps.
Sorting with sortB: 2.257 seconds. It used 1249749371 swaps.
当我把参数LIMIT
设置为,例如,50000
(java BubbleSortAnnomaly 50000 50000 10
),我得到了预期的结果:
Sorting with sortA: 3.983 seconds. It used 625941897 swaps.
Sorting with sortB: 4.658 seconds. It used 789391382 swaps.
我将程序移植到C++上,以确定这个问题是否只存在于Java中。以下是C++代码。
#include <cstdlib>
#include <iostream>
#include <omp.h>
#ifndef ARRAY_SIZE
#define ARRAY_SIZE 50000
#endif
#ifndef LIMIT
#define LIMIT 10
#endif
#ifndef RUNS
#define RUNS 10
#endif
void swap(int * a, int i, int j)
{
int h = a[i];
a[i] = a[j];
a[j] = h;
}
int bubbleSortA(int * a)
{
const int LAST = ARRAY_SIZE - 1;
int counter = 0;
for (int i = LAST; 0 < i; --i)
{
for (int j = 0; j < i; ++j)
{
int next = j + 1;
if (a[j] > a[next])
{
swap(a, j, next);
++counter;
}
}
}
return (counter);
}
int bubbleSortB(int * a)
{
const int LAST = ARRAY_SIZE - 1;
int counter = 0;
for (int i = LAST; 0 < i; --i)
{
for (int j = 0; j < i; ++j)
{
int next = j + 1;
if (a[j] >= a[next])
{
swap(a, j, next);
++counter;
}
}
}
return (counter);
}
int main()
{
int * a = (int *) malloc(ARRAY_SIZE * sizeof(int));
int * b = (int *) malloc(ARRAY_SIZE * sizeof(int));
for (int run = 0; RUNS > run; ++run)
{
for (int idx = 0; ARRAY_SIZE > idx; ++idx)
{
a[idx] = std::rand() % LIMIT;
b[idx] = a[idx];
}
std::cout << "Sorting with sortA: ";
double start = omp_get_wtime();
int swaps = bubbleSortA(a);
std::cout << (omp_get_wtime() - start) << " seconds. It used " << swaps
<< " swaps." << std::endl;
std::cout << "Sorting with sortB: ";
start = omp_get_wtime();
swaps = bubbleSortB(b);
std::cout << (omp_get_wtime() - start) << " seconds. It used " << swaps
<< " swaps." << std::endl;
}
free(a);
free(b);
return (0);
}
这个程序表现出相同的行为。有人可以解释一下到底发生了什么吗?
先执行 sortB
,然后再执行 sortA
并不会改变结果。
>
与>=
的影响将会非常小。为了得到真正有意义的时间数字,你需要测量许多不同的序列并取平均值。 - 463035818_is_not_a_number-DRUNS=XXX
(C++,编译器指令)在循环中运行它。而且结果是可重复的。 - Turing85bubbleSortB()
然后再调用bubbleSortA()
时,看结果是否仍然成立会很有趣。在Java中,我经常怀疑内存分配和垃圾回收会导致意外的结果。尽管在C++中获得相同的结果表明这里发生了更普遍的事情。 - Kevin Condon