如何在最快的时间内对几乎有序的数组进行排序？（Java）

Question

如何在最快的时间内对几乎有序的数组进行排序？（Java）

javaperformancealgorithmsortingsmoothsort

13

我有一个值得数组，几乎排序好了，但是有几个值被错放了（比如在10万个值中有50个位置不对）。如何最有效地对它进行排序？（性能非常关键，应该比O(N)更快）。

我知道smoothsort，但是找不到Java实现。是否有人知道它已经实现了吗？或者我可以使用什么替代品来完成这个任务？

- Alexander Temerev

36

你无法将其排序速度超过O(N)，因为这就是你需要确定数组是否已排序的时间。 - Botz3000

5

可能会有有关数组的额外信息。例如，被移动的成员都可能在末尾，那么您可以对它们进行排序（O(m log m)），然后将它们视为已插入（O(m log log n) 到 O(m log n)）以查找插入位置。 - Tom Hawtin - tackline

1

可能需要不同的硬件，但网络排序可以在O((log n)^2)的时间复杂度内进行排序。请参考链接网络排序。 - Himanshu Pandey

10个回答

10

鸡尾酒排序

如果你想要一个简单易实现的算法，可以使用鸡尾酒排序。它在几乎有序的输入上能够很好地工作。

- DigitalRoss

1

@Henk：我也是。在高中时，我将其作为冒泡排序的改进实现了，但我从未知道它有一个名字。 - R. Martinho Fernandes

7

正如Botz3000所指出的，你不能比O(N)更快地执行这样的操作。任何算法的最基本元素都是查找数组中顺序不正确的条目。即使在弄清楚要做什么之前，这也需要O(N)。

如果确实顺序不正确的元素数量远低于总元素数量，则可以使用以下算法（假设使用链表）：

1. 找到所有顺序不正确的项，并将其从原始列表中提取到单独的列表中，O(N) 2. 结果是两个列表：排序后的列表和一个短的提取列表 3. 对于每个提取的元素，将它们插入排序后的列表中。每个元素的时间复杂度为O(log(N))，总时间复杂度为O(Xlog(N))，其中X是提取元素的数量。如果X相对于N非常小，则最终总时间复杂度为O(N)。

- Roee Adler

2

其实证明并不难。当 x/n -> 0 时，你可以通过计算 xlog(n) 的极限来证明它。 - R. Martinho Fernandes

1

在第三步中，您如何推导出O(log N)？当使用链表时，二分查找可能很难实现... - Dirk

@Pavel：我不理解你的评论。如果你遍历一个已排序的列表，并测试所有项目是否满足a[i-1] <= a[i] <= a[i+1]，那么你可以很容易地在O(N)时间内找到所有逆序元素。 - Roee Adler

2

@Rax：是的，你可以找到它们，但所谓的“乱序”元素数量也可能是O（N），即使在你看它时其实少得多。考虑[1,5,10,2,3,4,5,6,7,8,9,10,11]，暴力算法将调用元素2到10并把它们看作是乱序的。这在后面的步骤中是没有任何意义的。 - P Shved

1

正如在这个问题中所清楚表明的：https://dev59.com/REnSa4cB1Zd3GeqPMT8Q，第一步需要 O(N log N) 的时间，超过了算法的其他部分，因此它不能在 O(N) 的时间内运行。 - R. Martinho Fernandes

显示剩余6条评论

6

这方面有许多优秀的算法。

我的个人最爱是Smoothsort... 如果你想知道为什么它效果如此好，我已经在这里详细说明了所有数学过程。

对于已排序数据而言，一个相当不错的算法是“自然合并排序”，它是归并排序的自底向上版本，通过将输入视为排序子范围序列，然后在范围中多次合并相邻的排序范围进行工作。如果数据已经排序（因为它可以检测到只有一个排序范围），它以O(n)时间运行，在最坏情况下为O(n lg n)。如果数据是“块排序”的，那么该算法的性能非常出色；也就是说，它由许多排序块紧密排列而成。

直接插入排序确实对大部分已排序数据有效，但在许多情况下会严重退化。一些真正好的排序算法（如introsort）实际上利用插入排序的这个属性对输入进行“清理步骤”。

- templatetypedef

你的有关Smoothsort的文章非常棒，我会支付1000000美元作为报酬。正如你所说，它非常有趣但是文档资料却非常匮乏（EWD对此的原始注释也令人沮丧），因此你的文章既是一顿滋补的大餐，又是一种舒缓的安慰。 - Tom Anderson

4

[Sun] JDK7有（或将有）一个Tim sort的实现（来自Python）。它是一种合并排序，利用已经存在于数组中的顺序。

- Tom Hawtin - tackline

请问您能否提供一个链接或者解释一下 tim sort 是如何工作的？ - R. Martinho Fernandes

好的，找到了并添加了链接。 - R. Martinho Fernandes

Tim sort最初是为Python实现的，当一些Java程序员（我想是Josh Bloch）看到它时，他说：“我们需要将其引入Java”。http://svn.python.org/projects/python/trunk/Objects/listsort.txt - Peter Recore

我说了同样的话！谢谢你指出这个相当好但不太为人知的算法。 - R. Martinho Fernandes

3

Smoothsort或Timsort是很好的算法，使用它们是明智的选择。

我想补充一点，你可能没有意识到的是，不起眼的插入排序是自适应的。确实，对于几乎已经排序好的列表，就像你似乎有的那样，我的理解（我无法用参考资料支持）是它比更复杂的算法更快。问题在于，如果输入不几乎排序好，它会迅速退化为O(n^2)。但是，它非常容易正确地实现，因此，如果你确定你的输入总是几乎排序好的，它将是一个不错的选择。

- Tom Anderson

2

只是简单地说，一个实现良好的冒泡排序肯定是这里最简单的算法。最坏情况下复杂度为O(n*m)，其中m是置换次数。m部分严重依赖于置换模式，通常总复杂度为O(n)。

- H H

0

这是原始的Java实现Smoothsort，曾经可以通过维基百科文章获得。

// by keeping these constants, we can avoid the tiresome business
// of keeping track of Dijkstra's b and c. Instead of keeping
// b and c, I will keep an index into this array.

static final int LP[] = { 1, 1, 3, 5, 9, 15, 25, 41, 67, 109,
    177, 287, 465, 753, 1219, 1973, 3193, 5167, 8361, 13529, 21891,
    35421, 57313, 92735, 150049, 242785, 392835, 635621, 1028457,
    1664079, 2692537, 4356617, 7049155, 11405773, 18454929, 29860703,
    48315633, 78176337, 126491971, 204668309, 331160281, 535828591,
    866988873 // the next number is > 31 bits.
};

public static <C extends Comparable<? super C>> void sort(C[] m,
    int lo, int hi) {
  int head = lo; // the offset of the first element of the prefix into m

  // These variables need a little explaining. If our string of heaps
  // is of length 38, then the heaps will be of size 25+9+3+1, which are
  // Leonardo numbers 6, 4, 2, 1. 
  // Turning this into a binary number, we get b01010110 = 0x56. We represent
  // this number as a pair of numbers by right-shifting all the zeros and 
  // storing the mantissa and exponent as "p" and "pshift".
  // This is handy, because the exponent is the index into L[] giving the
  // size of the rightmost heap, and because we can instantly find out if
  // the rightmost two heaps are consecutive Leonardo numbers by checking
  // (p&3)==3

  int p = 1; // the bitmap of the current standard concatenation >> pshift
  int pshift = 1;

  while (head < hi) {
    if ((p & 3) == 3) {
      // Add 1 by merging the first two blocks into a larger one.
      // The next Leonardo number is one bigger.
      sift(m, pshift, head);
      p >>>= 2;
      pshift += 2;
    } else {
      // adding a new block of length 1
      if (LP[pshift - 1] >= hi - head) {
        // this block is its final size.
        trinkle(m, p, pshift, head, false);
      } else {
        // this block will get merged. Just make it trusty.
        sift(m, pshift, head);
      }

      if (pshift == 1) {
        // LP[1] is being used, so we add use LP[0]
        p <<= 1;
        pshift--;
      } else {
        // shift out to position 1, add LP[1]
        p <<= (pshift - 1);
        pshift = 1;
      }
    }
    p |= 1;
    head++;
  }

  trinkle(m, p, pshift, head, false);

  while (pshift != 1 || p != 1) {
    if (pshift <= 1) {
      // block of length 1. No fiddling needed
      int trail = Integer.numberOfTrailingZeros(p & ~1);
      p >>>= trail;
      pshift += trail;
    } else {
      p <<= 2;
      p ^= 7;
      pshift -= 2;

      // This block gets broken into three bits. The rightmost
      // bit is a block of length 1. The left hand part is split into
      // two, a block of length LP[pshift+1] and one of LP[pshift].
      // Both these two are appropriately heapified, but the root
      // nodes are not necessarily in order. We therefore semitrinkle
      // both of them

      trinkle(m, p >>> 1, pshift + 1, head - LP[pshift] - 1, true);
      trinkle(m, p, pshift, head - 1, true);
    }

    head--;
  }
}

private static <C extends Comparable<? super C>> void sift(C[] m, int pshift,
    int head) {
  // we do not use Floyd's improvements to the heapsort sift, because we
  // are not doing what heapsort does - always moving nodes from near
  // the bottom of the tree to the root.

  C val = m[head];

  while (pshift > 1) {
    int rt = head - 1;
    int lf = head - 1 - LP[pshift - 2];

    if (val.compareTo(m[lf]) >= 0 && val.compareTo(m[rt]) >= 0)
      break;
    if (m[lf].compareTo(m[rt]) >= 0) {
      m[head] = m[lf];
      head = lf;
      pshift -= 1;
    } else {
      m[head] = m[rt];
      head = rt;
      pshift -= 2;
    }
  }

  m[head] = val;
}

private static <C extends Comparable<? super C>> void trinkle(C[] m, int p,
    int pshift, int head, boolean isTrusty) {

  C val = m[head];

  while (p != 1) {
    int stepson = head - LP[pshift];

    if (m[stepson].compareTo(val) <= 0)
      break; // current node is greater than head. Sift.

    // no need to check this if we know the current node is trusty,
    // because we just checked the head (which is val, in the first
    // iteration)
    if (!isTrusty && pshift > 1) {
      int rt = head - 1;
      int lf = head - 1 - LP[pshift - 2];
      if (m[rt].compareTo(m[stepson]) >= 0
          || m[lf].compareTo(m[stepson]) >= 0)
        break;
    }

    m[head] = m[stepson];

    head = stepson;
    int trail = Integer.numberOfTrailingZeros(p & ~1);
    p >>>= trail;
    pshift += trail;
    isTrusty = false;
  }

  if (!isTrusty) {
    m[head] = val;
    sift(m, pshift, head);
  }
}

- 11101101b

0

你对于无法达到O(N)的观点是正确的，但是假设我们有一台多核心的机器（我有这样的机器），我们可以通过使用并行排序算法来进行一些欺骗。

- Alexander Temerev

算法仍然以O(N)运行。O(N/2)也是O(N)。 - R. Martinho Fernandes

3

除非您拥有与输入规模成比例增长的核心数量 :) - Roee Adler

@Rax：类似于量子Bogosort（http://en.wikipedia.org/wiki/Bogosort#Quantum_Bogosort）利用平行宇宙的某些东西。 - R. Martinho Fernandes

@Martinho：太搞笑了，我之前不知道这件事。 - Roee Adler

顺便提一下，如果您使用并行排序（特别是分治类型的），不要忘记在某个时刻使用截止点切换到顺序排序，当并行开销超过其好处时。 - R. Martinho Fernandes

0

实现我们在学校称之为希尔排序的算法。这是对子数组进行冒泡排序。步长为k的子数组是具有索引0，k，2k，3k...的元素数组。

如果您选择k = 3i + 1，并从较高的i向下执行多个冒泡排序，则在几乎排序完成的数组上，时间将更短。

- P Shved

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Andrey Adamovich · Accepted Answer

实际上，维基百科包含了一个平滑排序的Java实现。你可以在这里找到它：

http://en.wikipedia.org/wiki/Smoothsort。