如何使用归并排序算法进行原地排序？

Question

如何使用归并排序算法进行原地排序？

arraysalgorithmsortingmergesortin-place

297

我知道这个问题不是太具体。

我想要的只是有人告诉我如何将普通归并排序转换为原地归并排序（或具有恒定额外空间开销的归并排序）。

我能在网络上找到的所有内容都是“它太复杂了”或“超出了本文的范围”。

已知实现原地合并（不使用任何额外空间）的唯一方法过于复杂，难以减少为实用程序。 (引自这里)

即使它太复杂了，基本概念是什么，如何使归并排序成为原地排序？

- Lazer

不错的问题，我在阅读昨天的一个问题时也问过自己：https://dev59.com/-nE85IYBdhLWcg3w3Xr6 - Chris Lercher

1

仅供参考，这里有一个不错的原地稳定归并排序实现。虽然有点复杂，但也不算太难。我最终在Java中实现了原地稳定归并排序和原地稳定快速排序。请注意，它的时间复杂度是O(n(log n)^2)。 - Thomas Mueller

这里描述了一个相当简单的方法：https://xinok.wordpress.com/2014/08/17/in-place-merge-sort-demystified-2/ - Branko Dimitrijevic

1

在通常的分割和合并算法中，如果您将指针数组定义为一个链接列表L(i)，其中您有一个条目地址，该地址是排序顺序中第一条记录的地址，并且该地址处的指针是排序顺序中第二条记录的地址，以此类推，您会发现您可以在O(n)时间内“原地”合并两个链接列表。最终，您会得到一个单独的指针作为链接列表的入口点，以及一个包含n-1个指针的链接列表。我将第n个链接列表条目设置为零，作为停止指示器，在合并中非常有用。您可以使用i=L(i)递归地遍历结果。 - Paul

10个回答

63

本文介绍了几种原地归并排序的变体，包括其“大成果” (PDF)：

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.5514&rep=rep1&type=pdf

更少移动的原地排序

Jyrki Katajainen, Tomi A. Pasanen

本文证明了可以使用 O(1) 额外空间、O(n log n / log log n) 元素移动和 n log₂n + O(n log log n) 次比较对一个由 n 个元素组成的数组进行排序。这是第一种在最坏情况下需要 o(n log n) 移动并保证 O(n log n) 比较的原地排序算法，但由于涉及到常数因子，该算法主要具有理论意义。

我认为这也很相关。我手头有一份它的打印，同事转给我的，但我还没有读过。它似乎涵盖了基本理论，但我对这个主题不熟悉，无法评判其全面性：

http://comjnl.oxfordjournals.org/cgi/content/abstract/38/8/681

最优稳定合并

Antonios Symvonis

本文展示了如何使用 O(m+n) 赋值、O(mlog(n/m+1)) 比较和仅使用常量额外空间来稳定地合并大小分别为 m 和 n，其中 m ≤ n 的两个序列 A 和 B。这个结果匹配了所有已知的下限...

- Steve Jessop

13

这并不是一种易于使用或高效的方法，我建议除非你确实需要它（而且可能只有在这是作业时才需要使用），否则不要这样做。你不能使用快速排序吗？将加上一些简单的优化以及其额外的内存是O(log N)，反正比较快。

无论如何，如果你必须这样做，那么就得这样做。这是我找到的：一个和二。虽然我不熟悉原地归并排序，但它似乎基本思想是使用旋转来促进合并两个数组，而无需使用额外的内存。

请注意，这甚至比经典的非原位归并排序还要慢。

- IVlad

16

快速排序不是稳定的，这对许多生产代码非常重要。 - Donal Fellows

9

如果在原地排序，快速排序可能是稳定的，而归并排序如果在原地排序则不一定稳定（iirc）。 - jk.

6

快速排序对于特定构造的输入也存在O(n^2)的最坏情况。 - HoboBen

1

快速排序的额外内存真的是O(log n)吗？我认为作为一个原地算法，它应该是O(1)的额外内存吧？哦，因为它是递归的，所以堆栈使用是O(log n)。 - kristianp

1

@kristianp：快速排序的额外内存取决于您如何管理递归堆栈。如果管理得聪明，可以证明它永远不会超过O(log n)；然而，一个天真的递归实现在最坏情况下可能会使用O(n)内存。 - Luatic

显示剩余2条评论

11

关键步骤是使合并本身就地完成。它并不像那些来源所说的那么困难，但是当你尝试时会失去一些东西。

看一步合并：

[...list-sorted...|x...list-A...|y...list-B...]

我们知道sorted序列小于所有其他元素，x小于A中的所有其他元素，y小于B中的所有其他元素。在x小于或等于y的情况下，只需将指针移动到其中一个A的开头。在y小于x的情况下，您必须将y整个移动到sorted中，经过整个A的洗牌是导致其变得昂贵的最后一步（除了退化情况）。

通常更便宜（特别是当数组实际上只包含每个元素的单个单词、例如指向字符串或结构的指针时）为了节省时间而牺牲一些空间，并拥有一个单独的临时数组，在其中来回排序。

- Donal Fellows

7

你的原地归并算法在最坏情况下具有O(mn)的时间复杂度，其中m是A的大小，n是B的大小。当A中第一个元素大于B中最后一个元素时，就会出现这种情况。通过在A和B之间添加一个堆，可以将复杂度改进为O(klog(k)+m+n)，其中k=min(m,n)。这个堆应该包含来自A的项，这些项比B中剩余的项大，但比A中剩余的项小。如果先耗尽了A中的元素，则必须将堆移到B的末尾；否则，堆必须移到A的开头。然后，在原地弹出堆中的项并将它们反转以完成归并。 - valyala

3

请注意，使用堆时，排序不再稳定。此外，如果您使用堆，则可以选择堆排序而不是归并排序。 - martinkunev

只想指出，原地合并在最优渐进时间复杂度下是可能的，请参见c++ - Is it possible to do an inplace merge without temporary storage? - Stack Overflow。 - user202729

7

一个在C语言中实现的无缓冲区归并排序示例。

#define SWAP(type, a, b) \
    do { type t=(a);(a)=(b);(b)=t; } while (0)

static void reverse_(int* a, int* b)
{
    for ( --b; a < b; a++, b-- )
       SWAP(int, *a, *b);
}
static int* rotate_(int* a, int* b, int* c)
/* swap the sequence [a,b) with [b,c). */
{
    if (a != b && b != c)
     {
       reverse_(a, b);
       reverse_(b, c);
       reverse_(a, c);
     }
    return a + (c - b);
}

static int* lower_bound_(int* a, int* b, const int key)
/* find first element not less than @p key in sorted sequence or end of
 * sequence (@p b) if not found. */
{
    int i;
    for ( i = b-a; i != 0; i /= 2 )
     {
       int* mid = a + i/2;
       if (*mid < key)
          a = mid + 1, i--;
     }
    return a;
}
static int* upper_bound_(int* a, int* b, const int key)
/* find first element greater than @p key in sorted sequence or end of
 * sequence (@p b) if not found. */
{
    int i;
    for ( i = b-a; i != 0; i /= 2 )
     {
       int* mid = a + i/2;
       if (*mid <= key)
          a = mid + 1, i--;
     }
    return a;
}

static void ip_merge_(int* a, int* b, int* c)
/* inplace merge. */
{
    int n1 = b - a;
    int n2 = c - b;

    if (n1 == 0 || n2 == 0)
       return;
    if (n1 == 1 && n2 == 1)
     {
       if (*b < *a)
          SWAP(int, *a, *b);
     }
    else
     {
       int* p, * q;

       if (n1 <= n2)
          p = upper_bound_(a, b, *(q = b+n2/2));
       else
          q = lower_bound_(b, c, *(p = a+n1/2));
       b = rotate_(p, b, q);

       ip_merge_(a, p, b);
       ip_merge_(b, q, c);
     }
}

void mergesort(int* v, int n)
{
    if (n > 1)
     {
       int h = n/2;
       mergesort(v, h); mergesort(v+h, n-h);
       ip_merge_(v, v+h, v+n);
     }
}

自适应归并排序的示例（优化版）。

增加支持代码和修改来加速合并操作，当有任何大小的辅助缓冲区可用时（仍可在没有额外内存的情况下工作）。使用向前和向后的合并，环形旋转，小序列的合并和排序以及迭代式归并排序。

#include <stdlib.h>
#include <string.h>

static int* copy_(const int* a, const int* b, int* out)
{
    int count = b - a;
    if (a != out)
       memcpy(out, a, count*sizeof(int));
    return out + count;
}
static int* copy_backward_(const int* a, const int* b, int* out)
{
    int count = b - a;
    if (b != out)
       memmove(out - count, a, count*sizeof(int));
    return out - count;
}

static int* merge_(const int* a1, const int* b1, const int* a2,
  const int* b2, int* out)
{
    while ( a1 != b1 && a2 != b2 )
       *out++ = (*a1 <= *a2) ? *a1++ : *a2++;
    return copy_(a2, b2, copy_(a1, b1, out));
}
static int* merge_backward_(const int* a1, const int* b1,
  const int* a2, const int* b2, int* out)
{
    while ( a1 != b1 && a2 != b2 )
       *--out = (*(b1-1) > *(b2-1)) ? *--b1 : *--b2;
    return copy_backward_(a1, b1, copy_backward_(a2, b2, out));
}

static unsigned int gcd_(unsigned int m, unsigned int n)
{
    while ( n != 0 )
     {
       unsigned int t = m % n;
       m = n;
       n = t;
     }
    return m;
}
static void rotate_inner_(const int length, const int stride,
  int* first, int* last)
{
    int* p, * next = first, x = *first;
    while ( 1 )
     {
       p = next;
       if ((next += stride) >= last)
          next -= length;
       if (next == first)
          break;
       *p = *next;
     }
    *p = x;
}
static int* rotate_(int* a, int* b, int* c)
/* swap the sequence [a,b) with [b,c). */
{
    if (a != b && b != c)
     {
       int n1 = c - a;
       int n2 = b - a;

       int* i = a;
       int* j = a + gcd_(n1, n2);

       for ( ; i != j; i++ )
          rotate_inner_(n1, n2, i, c);
     }
    return a + (c - b);
}

static void ip_merge_small_(int* a, int* b, int* c)
/* inplace merge.
 * @note faster for small sequences. */
{
    while ( a != b && b != c )
       if (*a <= *b)
          a++;
       else
        {
          int* p = b+1;
          while ( p != c && *p < *a )
             p++;
          rotate_(a, b, p);
          b = p;
        }
}
static void ip_merge_(int* a, int* b, int* c, int* t, const int ts)
/* inplace merge.
 * @note works with or without additional memory. */
{
    int n1 = b - a;
    int n2 = c - b;

    if (n1 <= n2 && n1 <= ts)
     {
       merge_(t, copy_(a, b, t), b, c, a);
     }
    else if (n2 <= ts)
     {
       merge_backward_(a, b, t, copy_(b, c, t), c);
     }
    /* merge without buffer. */
    else if (n1 + n2 < 48)
     {
       ip_merge_small_(a, b, c);
     }
    else
     {
       int* p, * q;

       if (n1 <= n2)
          p = upper_bound_(a, b, *(q = b+n2/2));
       else
          q = lower_bound_(b, c, *(p = a+n1/2));
       b = rotate_(p, b, q);

       ip_merge_(a, p, b, t, ts);
       ip_merge_(b, q, c, t, ts);
     }
}
static void ip_merge_chunk_(const int cs, int* a, int* b, int* t,
  const int ts)
{
    int* p = a + cs*2;
    for ( ; p <= b; a = p, p += cs*2 )
       ip_merge_(a, a+cs, p, t, ts);
    if (a+cs < b)
       ip_merge_(a, a+cs, b, t, ts);
}

static void smallsort_(int* a, int* b)
/* insertion sort.
 * @note any stable sort with low setup cost will do. */
{
    int* p, * q;
    for ( p = a+1; p < b; p++ )
     {
       int x = *p;
       for ( q = p; a < q && x < *(q-1); q-- )
          *q = *(q-1);
       *q = x;
     }
}
static void smallsort_chunk_(const int cs, int* a, int* b)
{
    int* p = a + cs;
    for ( ; p <= b; a = p, p += cs )
       smallsort_(a, p);
    smallsort_(a, b);
}

static void mergesort_lower_(int* v, int n, int* t, const int ts)
{
    int cs = 16;
    smallsort_chunk_(cs, v, v+n);
    for ( ; cs < n; cs *= 2 )
       ip_merge_chunk_(cs, v, v+n, t, ts);
}

static void* get_buffer_(int size, int* final)
{
    void* p = NULL;
    while ( size != 0 && (p = malloc(size)) == NULL )
       size /= 2;
    *final = size;
    return p;
}
void mergesort(int* v, int n)
{
    /* @note buffer size may be in the range [0,(n+1)/2]. */
    int request = (n+1)/2 * sizeof(int);
    int actual;
    int* t = (int*) get_buffer_(request, &actual);

    /* @note allocation failure okay. */
    int tsize = actual / sizeof(int);
    mergesort_lower_(v, n, t, tsize);
    free(t);
}

- Johnny Cage

5

是你写的吗？它是如何克服其他答案中提到的困难的？它的运行时间是多少？ - Thomas Ahle

这是从我的自定义库中改编的，但如果你问我是否创建了这些算法，那么不是。增长是O(n(log n)^2)，没有辅助内存；使用完整缓冲区为O(n log n)。这试图成为一个实用的实现，并结构化显示组成算法。 - Johnny Cage

为什么需要递归或额外的缓冲区来合并两个已排序的列表？我认为可以通过将两个指针向前移动并在左侧大于右侧时进行交换来完成。 - jack

@jack：你确定可以在一个单一的、快速的循环中完成合并吗（总共需要O(n)次比较和O(n)次交换，以原地合并n个元素），而且还能保持稳定的顺序，而不需要递归或额外的缓冲区吗？如果是这样，请发表一个答案，因为这听起来像是基于比较的排序算法的圣杯。 - pts

这个无缓冲合并排序看起来非常类似于libstdc++的一个（请参见Philipp Claßen的回答），以及基于它的Java实现。后者还包含了3次对reverse(...)的调用，但是这些被注释掉了，并且提供了一个更复杂的、基于GCD的旋转代码。libstdc++使用了更复杂的基于GCD的旋转算法。 - pts

7

这个答案有一个代码示例, 它实现了Bing-Chao Huang和Michael A. Langston在论文Practical In-Place Merging中描述的算法。我必须承认我不理解细节，但合并步骤的复杂度是O(n)。

从实际角度来看，有证据表明纯原地实现在真实世界的场景中表现并不更好。例如，C++标准定义了std::inplace_merge, 正如其名称所暗示的，它是一个原地合并操作。

假设C++库通常经过很好的优化，看看它是如何实现的确实很有趣：

1) libstdc++（GCC代码库的一部分）：std::inplace_merge

该实现委托给__inplace_merge，它通过尝试分配临时缓冲区来规避问题：

typedef _Temporary_buffer<_BidirectionalIterator, _ValueType> _TmpBuf;
_TmpBuf __buf(__first, __len1 + __len2);

if (__buf.begin() == 0)
  std::__merge_without_buffer
    (__first, __middle, __last, __len1, __len2, __comp);
else
  std::__merge_adaptive
   (__first, __middle, __last, __len1, __len2, __buf.begin(),
     _DistanceType(__buf.size()), __comp);

否则，它将退回到一个实现（__merge_without_buffer），它不需要额外的内存，但不再以O(n)的时间运行。

2）libc++（Clang代码库的一部分）：std::inplace_merge

看起来相似。它委托给一个function，该函数也试图分配缓冲区。根据是否获得足够的元素，它将选择实现方式。常数内存回退函数称为__buffered_inplace_merge。

甚至回退函数仍然是O(n)时间，但关键是如果有临时内存可用，则不使用实现方式。

请注意，C++标准明确赋予实现选择此方法的自由，将所需复杂度从O(n)降低到O(N log N)：

复杂度：如果有足够的额外内存，则需要进行N-1次比较。如果内存不足，则需要进行O(N log N)次比较。

当然，这并不能被视为常数空间原地合并在O(n)时间中永远不应该使用的证明。另一方面，如果更快的话，优化后的C++库可能会切换到那种类型的实现方式。

- Philipp Claßen

这个Java实现基于libstdc++的*__merge_without_buffer*。 - pts

这个Java实现是基于libstdc++的*__merge_without_buffer*。 - undefined

这个Java实现基于libstdc++的*__merge_adaptive*（因此它使用了一个临时缓冲区）。 - pts

3

这是我的C语言版本：

void mergesort(int *a, int len) {
  int temp, listsize, xsize;

  for (listsize = 1; listsize <= len; listsize*=2) {
    for (int i = 0, j = listsize; (j+listsize) <= len; i += (listsize*2), j += (listsize*2)) {
      merge(& a[i], listsize, listsize);
    }
  }

  listsize /= 2;

  xsize = len % listsize;
  if (xsize > 1)
    mergesort(& a[len-xsize], xsize);

  merge(a, listsize, xsize);
}

void merge(int *a, int sizei, int sizej) {
  int temp;
  int ii = 0;
  int ji = sizei;
  int flength = sizei+sizej;

  for (int f = 0; f < (flength-1); f++) {
    if (sizei == 0 || sizej == 0)
      break;

    if (a[ii] < a[ji]) {
      ii++;
      sizei--;
    }
    else {
      temp = a[ji];

      for (int z = (ji-1); z >= ii; z--)
        a[z+1] = a[z];  
      ii++;

      a[f] = temp;

      ji++;
      sizej--;
    }
  }
}

- Dylan Nissley

1

请注意，该实现在最坏情况下（反转数组）需要Θ(n^2 log n)的时间。 - martinkunev

确实，归并排序的难点在于快速、原地、稳定地合并。这个答案中的 merge 函数确实是原地的，但不够快：它在最坏情况下的时间复杂度为 O(flength*2)，当输入数组是反向的（递减顺序）；快速合并的期望时间复杂度为 O(flength) 或 O(flengthlog(flength))。但它是否是稳定的？ - pts

2

我知道我来晚了，但这是昨天写的一个解决方案。我还在其他地方发布了这个方案，但这似乎是SO上最受欢迎的合并方案线程。我也没有看到这个算法在其他地方发布过，所以希望这对一些人有所帮助。

这个算法是以最简单的形式呈现的，以便理解。它可以进行大量的调整以提高速度。平均时间复杂度为：稳定的原地数组合并为O(n.log₂n)，总排序为O(n.(log₂n)²)。

//                     Stable Merge In Place Sort
//
//
// The following code is written to illustrate the base algorithm. A good
// number of optimizations can be applied to boost its overall speed
// For all its simplicity, it does still perform somewhat decently.
// Average case time complexity appears to be: O(n.(log₂n)²)

#include <stddef.h>
#include <stdio.h>

#define swap(x, y)    (t=(x), (x)=(y), (y)=t)

// Both sorted sub-arrays must be adjacent in 'a'
// Assumes that both 'an' and 'bn' are always non-zero
// 'an' is the length of the first sorted section in 'a', referred to as A
// 'bn' is the length of the second sorted section in 'a', referred to as B
static void
merge_inplace(int A[], size_t an, size_t bn)
{
    int t, *B = &A[an];
    size_t  pa, pb;     // Swap partition pointers within A and B

    // Find the portion to swap.  We're looking for how much from the
    // start of B can swap with the end of A, such that every element
    // in A is less than or equal to any element in B.  This is quite
    // simple when both sub-arrays come at us pre-sorted
    for(pa = an, pb = 0; pa>0 && pb<bn && B[pb] < A[pa-1]; pa--, pb++);

    // Now swap last part of A with first part of B according to the
    // indicies we found
    for (size_t index=pa; index < an; index++)
        swap(A[index], B[index-pa]);

    // Now merge the two sub-array pairings.  We need to check that either array
    // didn't wholly swap out the other and cause the remaining portion to be zero
    if (pa>0 && (an-pa)>0)
        merge_inplace(A, pa, an-pa);

    if (pb>0 && (bn-pb)>0)
        merge_inplace(B, pb, bn-pb);
} // merge_inplace

// Implements a recursive merge-sort algorithm with an optional
// insertion sort for when the splits get too small.  'n' must
// ALWAYS be 2 or more.  It enforces this when calling itself
static void
merge_sort(int a[], size_t n)
{
    size_t  m = n/2;

    // Sort first and second halves only if the target 'n' will be > 1
    if (m > 1)
        merge_sort(a, m);

    if ((n-m)>1)
        merge_sort(a+m, n-m);

    // Now merge the two sorted sub-arrays together. We know that since
    // n > 1, then both m and n-m MUST be non-zero, and so we will never
    // violate the condition of not passing in zero length sub-arrays
    merge_inplace(a, m, n-m);
} // merge_sort

// Print an array */
static void
print_array(int a[], size_t size)
{
    if (size > 0) {
        printf("%d", a[0]);
        for (size_t i = 1; i < size; i++)
            printf(" %d", a[i]);
    }
    printf("\n");
} // print_array
 
// Test driver
int
main()
{
    int a[] = { 17, 3, 16, 5, 14, 8, 10, 7, 15, 1, 13, 4, 9, 12, 11, 6, 2 };
    size_t  n = sizeof(a) / sizeof(a[0]);
 
    merge_sort(a, n);
 
    print_array(a, n);

    return 0;
} // main

- Stew Forster

最坏情况时间复杂度是什么？ - pts

最坏情况时间复杂度是什么？ - undefined

2

通过使用C++中的std::inplace_merge函数，可以实现原地归并排序，具体实现如下：

template< class _Type >
inline void merge_sort_inplace(_Type* src, size_t l, size_t r)
{
    if (r <= l) return;

    size_t m = l + ( r - l ) / 2;             // computes the average without overflow

    merge_sort_inplace(src, l,     m);
    merge_sort_inplace(src, m + 1, r);

    std::inplace_merge(src + l, src + m + 1, src + r + 1);
}

更多排序算法，包括并行实现，可在https://github.com/DragonSpit/ParallelAlgorithms仓库中找到。该仓库是开源且免费的。

- DragonSpit

-6

我刚刚尝试了使用插入排序算法在JAVA中实现归并排序的原地合并算法，具体步骤如下：
1）有两个已排序的数组。
2）比较这两个数组的第一个值，并将较小的值放入第一个数组中。
3）使用插入排序（从左到右遍历）将较大的值放入第二个数组中。
4）然后再次比较第一个数组的第二个值和第二个数组的第一个值，并执行相同的操作。但是当发生交换时，有一些提示可以跳过进一步比较，只需要进行交换。

我在这里进行了一些优化，以减少插入排序中的比较次数。
唯一的缺点是需要在第二个数组中进行更多的元素交换。

e.g)

第一个数组：3、7、8、9

第二个数组：1、2、4、5

然后，7、8、9使第二个数组每次将其所有元素向左移动一个位置以将自己放在最后。

因此，这里的假设是与比较两个项相比，交换项是可以忽略不计的。

https://github.com/skanagavelu/algorithams/blob/master/src/sorting/MergeSort.java

package sorting;

import java.util.Arrays;

public class MergeSort {
    public static void main(String[] args) {
    int[] array = { 5, 6, 10, 3, 9, 2, 12, 1, 8, 7 };
    mergeSort(array, 0, array.length -1);
    System.out.println(Arrays.toString(array));

    int[] array1 = {4, 7, 2};
    System.out.println(Arrays.toString(array1));
    mergeSort(array1, 0, array1.length -1);
    System.out.println(Arrays.toString(array1));
    System.out.println("\n\n");

    int[] array2 = {4, 7, 9};
    System.out.println(Arrays.toString(array2));
    mergeSort(array2, 0, array2.length -1);
    System.out.println(Arrays.toString(array2));
    System.out.println("\n\n");

    int[] array3 = {4, 7, 5};
    System.out.println(Arrays.toString(array3));
    mergeSort(array3, 0, array3.length -1);
    System.out.println(Arrays.toString(array3));
    System.out.println("\n\n");

    int[] array4 = {7, 4, 2};
    System.out.println(Arrays.toString(array4));
    mergeSort(array4, 0, array4.length -1);
    System.out.println(Arrays.toString(array4));
    System.out.println("\n\n");

    int[] array5 = {7, 4, 9};
    System.out.println(Arrays.toString(array5));
    mergeSort(array5, 0, array5.length -1);
    System.out.println(Arrays.toString(array5));
    System.out.println("\n\n");

    int[] array6 = {7, 4, 5};
    System.out.println(Arrays.toString(array6));
    mergeSort(array6, 0, array6.length -1);
    System.out.println(Arrays.toString(array6));
    System.out.println("\n\n");

    //Handling array of size two
    int[] array7 = {7, 4};
    System.out.println(Arrays.toString(array7));
    mergeSort(array7, 0, array7.length -1);
    System.out.println(Arrays.toString(array7));
    System.out.println("\n\n");

    int input1[] = {1};
    int input2[] = {4,2};
    int input3[] = {6,2,9};
    int input4[] = {6,-1,10,4,11,14,19,12,18};
    System.out.println(Arrays.toString(input1));
    mergeSort(input1, 0, input1.length-1);
    System.out.println(Arrays.toString(input1));
    System.out.println("\n\n");

    System.out.println(Arrays.toString(input2));
    mergeSort(input2, 0, input2.length-1);
    System.out.println(Arrays.toString(input2));
    System.out.println("\n\n");

    System.out.println(Arrays.toString(input3));
    mergeSort(input3, 0, input3.length-1);
    System.out.println(Arrays.toString(input3));
    System.out.println("\n\n");

    System.out.println(Arrays.toString(input4));
    mergeSort(input4, 0, input4.length-1);
    System.out.println(Arrays.toString(input4));
    System.out.println("\n\n");
}

private static void mergeSort(int[] array, int p, int r) {
    //Both below mid finding is fine.
    int mid = (r - p)/2 + p;
    int mid1 = (r + p)/2;
    if(mid != mid1) {
        System.out.println(" Mid is mismatching:" + mid + "/" + mid1+ "  for p:"+p+"  r:"+r);
    }

    if(p < r) {
        mergeSort(array, p, mid);
        mergeSort(array, mid+1, r);
//      merge(array, p, mid, r);
        inPlaceMerge(array, p, mid, r);
        }
    }

//Regular merge
private static void merge(int[] array, int p, int mid, int r) {
    int lengthOfLeftArray = mid - p + 1; // This is important to add +1.
    int lengthOfRightArray = r - mid;

    int[] left = new int[lengthOfLeftArray];
    int[] right = new int[lengthOfRightArray];

    for(int i = p, j = 0; i <= mid; ){
        left[j++] = array[i++];
    }

    for(int i = mid + 1, j = 0; i <= r; ){
        right[j++] = array[i++];
    }

    int i = 0, j = 0;
    for(; i < left.length && j < right.length; ) {
        if(left[i] < right[j]){
            array[p++] = left[i++];
        } else {
            array[p++] = right[j++];
        }
    }
    while(j < right.length){
        array[p++] = right[j++];
    } 
    while(i < left.length){
        array[p++] = left[i++];
    }
}

//InPlaceMerge no extra array
private static void inPlaceMerge(int[] array, int p, int mid, int r) {
    int secondArrayStart = mid+1;
    int prevPlaced = mid+1;
    int q = mid+1;
    while(p < mid+1 && q <= r){
        boolean swapped = false;
        if(array[p] > array[q]) {
            swap(array, p, q);
            swapped = true;
        }   
        if(q != secondArrayStart && array[p] > array[secondArrayStart]) {
            swap(array, p, secondArrayStart);
            swapped = true;
        }
        //Check swapped value is in right place of second sorted array
        if(swapped && secondArrayStart+1 <= r && array[secondArrayStart+1] < array[secondArrayStart]) {
            prevPlaced = placeInOrder(array, secondArrayStart, prevPlaced);
        }
        p++;
        if(q < r) {     //q+1 <= r) {
            q++;
        }
    }
}

private static int placeInOrder(int[] array, int secondArrayStart, int prevPlaced) {
    int i = secondArrayStart;
    for(; i < array.length; i++) {
        //Simply swap till the prevPlaced position
        if(secondArrayStart < prevPlaced) {
            swap(array, secondArrayStart, secondArrayStart+1);
            secondArrayStart++;
            continue;
        }
        if(array[i] < array[secondArrayStart]) {
            swap(array, i, secondArrayStart);
            secondArrayStart++;
        } else if(i != secondArrayStart && array[i] > array[secondArrayStart]){
            break;
        }
    }
    return secondArrayStart;
}

private static void swap(int[] array, int m, int n){
    int temp = array[m];
    array[m] = array[n];
    array[n] = temp;
}
}

- Kanagavelu Sugumar

4

由于偶尔出现语法错误和不一致/糟糕的风格，它既是O(n²)，也极难阅读。 - glaba

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Larry LIU Xinyu · Accepted Answer

Knuth将此作为练习留下来（Vol 3, 5.2.5）。确实存在原地合并排序算法，但必须仔细实现。

首先，像这里描述的天真原地合并排序不是正确的解决方案。它会将性能降级到O(N²)。

思路是在排序部分数组的同时使用其余部分作为合并的工作区域。

例如像下面这个合并函数。

void wmerge(Key* xs, int i, int m, int j, int n, int w) {
    while (i < m && j < n)
        swap(xs, w++, xs[i] < xs[j] ? i++ : j++);
    while (i < m)
        swap(xs, w++, i++);
    while (j < n)
        swap(xs, w++, j++);
}

该算法接受数组 xs，将两个已排序的子数组表示为范围 [i, m) 和 [j, n)。工作区从 w 开始。与大多数教科书中给出的标准合并算法相比，这个算法交换已排序子数组和工作区之间的内容。因此，先前的工作区包含合并后的已排序元素，而之前存储在工作区中的元素被移动到两个子数组中。

然而，必须满足以下两个约束条件：

工作区应在数组边界内。换句话说，它应该足够大，以容纳交换元素而不会导致任何越界错误。
工作区可以与两个已排序数组重叠；但是，它必须确保未合并的元素没有被覆盖。

有了这个合并算法的定义，很容易想象一个解决方案，可以对数组的一半进行排序；下一个问题是如何处理存储在工作区中的未排序部分，如下所示：

... unsorted 1/2 array ... | ... sorted 1/2 array ...

一个直观的想法是对工作区域的另一半进行递归排序，这样只有1/4的元素还没有被排序。

... unsorted 1/4 array ... | sorted 1/4 array B | sorted 1/2 array A ...

在这个阶段的关键是，我们必须早晚将已排序的四分之一元素B与已排序的一半元素A合并。

剩下的工作区只持有四分之一的元素，能否足够大以合并A和B呢？不幸的是，它不够大。

然而，上面提到的第二个约束条件给了我们一个提示，即我们可以通过使工作区与任一子数组重叠来利用它，如果我们能确保未合并的元素不会被覆盖。

实际上，我们可以对工作区的前一半进行排序，并将工作区放置在两个已排序的数组之间，像这样：

... sorted 1/4 array B | unsorted work area | ... sorted 1/2 array A ...

这个设置有效地安排了工作区域与子数组A的重叠部分。该想法是在[Jyrki Katajainen，Tomi Pasanen，Jukka Teuhola。"Practical in-place mergesort"。Nordic Journal of Computing，1996]中提出的。

因此，唯一剩下的就是重复上述步骤，将工作区域缩小为1/2、1/4、1/8等。当工作区域足够小（例如，只剩下两个元素）时，我们可以切换到简单的插入排序来结束这个算法。

这是基于该论文的 ANSI C 实现。

void imsort(Key* xs, int l, int u);

void swap(Key* xs, int i, int j) {
    Key tmp = xs[i]; xs[i] = xs[j]; xs[j] = tmp;
}

/* 
 * sort xs[l, u), and put result to working area w. 
 * constraint, len(w) == u - l
 */
void wsort(Key* xs, int l, int u, int w) {
    int m;
    if (u - l > 1) {
        m = l + (u - l) / 2;
        imsort(xs, l, m);
        imsort(xs, m, u);
        wmerge(xs, l, m, m, u, w);
    }
    else
        while (l < u)
            swap(xs, l++, w++);
}

void imsort(Key* xs, int l, int u) {
    int m, n, w;
    if (u - l > 1) {
        m = l + (u - l) / 2;
        w = l + u - m;
        wsort(xs, l, m, w); /* the last half contains sorted elements */
        while (w - l > 2) {
            n = w;
            w = l + (n - l + 1) / 2;
            wsort(xs, w, n, l);  /* the first half of the previous working area contains sorted elements */
            wmerge(xs, l, l + n - w, n, u, w);
        }
        for (n = w; n > l; --n) /*switch to insertion sort*/
            for (m = n; m < u && xs[m] < xs[m-1]; ++m)
                swap(xs, m, m - 1);
    }
}

其中wmerge在之前已经被定义。

完整的源代码可以在这里找到，详细的解释可以在这里找到。

顺便说一下，这个版本不是最快的归并排序，因为它需要更多的交换操作。根据我的测试，它比标准版本要快，标准版本在每次递归中都分配额外的空间。但它比优化版本要慢，优化版本提前将原始数组加倍，并将其用于进一步的合并。