对于初学者的归并排序解释

25

我在网上找到了这段代码:

def merge(left, right):
    result = []
    i ,j = 0, 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result += left[i:]
    result += right[j:]
    return result

def mergesort(list):
    if len(list) < 2:
        return list
    middle = len(list) / 2
    left = mergesort(list[:middle])
    right = mergesort(list[middle:])
    return merge(left, right)

在我运行它的时候,它是百分之百有效的。但我并不真正明白归并排序是如何工作的,也不理解递归函数如何能够正确地对左侧和右侧进行排序。


10
请查看维基页面上的动态GIF图:http://en.wikipedia.org/wiki/Merge_sort - Andrew Barrett
可能是归并排序中最干净的代码之一。这段代码的来源是什么?请提供链接。 - Surya
8个回答

61

我认为理解归并排序的关键在于理解以下原则,我称之为归并原则:

给定两个按最小到最大排序的单独列表 A 和 B,通过重复比较 A 中的最小值和 B 中的最小值、删除较小的值并将其附加到 C 上来构建列表 C。当一个列表耗尽时,以顺序方式将另一个列表中的剩余项附加到 C 上。列表 C 也是一个排序好的列表。

如果您手动执行几次,就会发现它是正确的。例如:

A = 1, 3
B = 2, 4
C = 
min(min(A), min(B)) = 1

A = 3
B = 2, 4
C = 1
min(min(A), min(B)) = 2

A = 3
B = 4
C = 1, 2
min(min(A), min(B)) = 3

A = 
B = 4
C = 1, 2, 3

现在A已经用尽,因此使用B中剩余的值扩展C:

C = 1, 2, 3, 4
合并原则很容易证明。A的最小值小于所有其他A的值,B的最小值小于所有其他B的值。如果A的最小值小于B的最小值,则它也必定小于所有B的值。因此它小于所有A和B的值。只要将满足这些条件的值追加到C中,就可以得到一个排序的列表。这就是上面的“合并”函数所做的事情。
现在,有了这个原则,理解一种通过将列表分成较小的列表、对这些列表进行排序,然后将这些排序后的列表合并在一起的排序技术就非常容易了。"merge_sort"函数只是将一个列表分成两半,对这两个列表进行排序,然后按照上述方式将这两个列表合并在一起的函数。
唯一需要注意的是,因为它是递归的,当它对两个子列表进行排序时,会将它们传递给自身!如果您难以理解这里的递归,建议先学习更简单的问题。但是如果您已经掌握了递归的基础知识,那么您只需意识到一个单项列表已经排序。合并两个单项列表生成一个排序的两项列表;合并两个两项列表生成一个排序的四项列表;依此类推。

2
非常好。对归并排序的基本单元进行了非常好的讲解。 - agf
8
我浏览了10-20个搜索结果,其中有可怕的图形解释,大段大段的文字谈论一般的排序。这正是我想要的。切入正题,直接给我核心思想的内容。非常感谢。现在我需要在Google上搜索类似的快排描述。 - Amogh Talpallikar
我理解算法中的“排序”部分是如何工作的。但我不明白算法如何将这些小的已排序列表进行“合并”操作。 - user3932000
@user3932000,我并没有太多地谈论排序部分。我主要是在谈论合并操作。在上面,AB是合并操作的输入,而C是输出。有什么具体的不清楚吗? - senderle

15

当我遇到理解算法过程中的困难时,我会添加调试输出来检查算法内部真正发生了什么。

以下是带有调试输出的代码。尝试通过mergesort的递归调用和merge对输出的处理来理解所有步骤:

def merge(left, right):
    result = []
    i ,j = 0, 0
    while i < len(left) and j < len(right):
        print('left[i]: {} right[j]: {}'.format(left[i],right[j]))
        if left[i] <= right[j]:
            print('Appending {} to the result'.format(left[i]))           
            result.append(left[i])
            print('result now is {}'.format(result))
            i += 1
            print('i now is {}'.format(i))
        else:
            print('Appending {} to the result'.format(right[j]))
            result.append(right[j])
            print('result now is {}'.format(result))
            j += 1
            print('j now is {}'.format(j))
    print('One of the list is exhausted. Adding the rest of one of the lists.')
    result += left[i:]
    result += right[j:]
    print('result now is {}'.format(result))
    return result

def mergesort(L):
    print('---')
    print('mergesort on {}'.format(L))
    if len(L) < 2:
        print('length is 1: returning the list withouth changing')
        return L
    middle = len(L) / 2
    print('calling mergesort on {}'.format(L[:middle]))
    left = mergesort(L[:middle])
    print('calling mergesort on {}'.format(L[middle:]))
    right = mergesort(L[middle:])
    print('Merging left: {} and right: {}'.format(left,right))
    out = merge(left, right)
    print('exiting mergesort on {}'.format(L))
    print('#---')
    return out


mergesort([6,5,4,3,2,1])

输出:

---
mergesort on [6, 5, 4, 3, 2, 1]
calling mergesort on [6, 5, 4]
---
mergesort on [6, 5, 4]
calling mergesort on [6]
---
mergesort on [6]
length is 1: returning the list withouth changing
calling mergesort on [5, 4]
---
mergesort on [5, 4]
calling mergesort on [5]
---
mergesort on [5]
length is 1: returning the list withouth changing
calling mergesort on [4]
---
mergesort on [4]
length is 1: returning the list withouth changing
Merging left: [5] and right: [4]
left[i]: 5 right[j]: 4
Appending 4 to the result
result now is [4]
j now is 1
One of the list is exhausted. Adding the rest of one of the lists.
result now is [4, 5]
exiting mergesort on [5, 4]
#---
Merging left: [6] and right: [4, 5]
left[i]: 6 right[j]: 4
Appending 4 to the result
result now is [4]
j now is 1
left[i]: 6 right[j]: 5
Appending 5 to the result
result now is [4, 5]
j now is 2
One of the list is exhausted. Adding the rest of one of the lists.
result now is [4, 5, 6]
exiting mergesort on [6, 5, 4]
#---
calling mergesort on [3, 2, 1]
---
mergesort on [3, 2, 1]
calling mergesort on [3]
---
mergesort on [3]
length is 1: returning the list withouth changing
calling mergesort on [2, 1]
---
mergesort on [2, 1]
calling mergesort on [2]
---
mergesort on [2]
length is 1: returning the list withouth changing
calling mergesort on [1]
---
mergesort on [1]
length is 1: returning the list withouth changing
Merging left: [2] and right: [1]
left[i]: 2 right[j]: 1
Appending 1 to the result
result now is [1]
j now is 1
One of the list is exhausted. Adding the rest of one of the lists.
result now is [1, 2]
exiting mergesort on [2, 1]
#---
Merging left: [3] and right: [1, 2]
left[i]: 3 right[j]: 1
Appending 1 to the result
result now is [1]
j now is 1
left[i]: 3 right[j]: 2
Appending 2 to the result
result now is [1, 2]
j now is 2
One of the list is exhausted. Adding the rest of one of the lists.
result now is [1, 2, 3]
exiting mergesort on [3, 2, 1]
#---
Merging left: [4, 5, 6] and right: [1, 2, 3]
left[i]: 4 right[j]: 1
Appending 1 to the result
result now is [1]
j now is 1
left[i]: 4 right[j]: 2
Appending 2 to the result
result now is [1, 2]
j now is 2
left[i]: 4 right[j]: 3
Appending 3 to the result
result now is [1, 2, 3]
j now is 3
One of the list is exhausted. Adding the rest of one of the lists.
result now is [1, 2, 3, 4, 5, 6]
exiting mergesort on [6, 5, 4, 3, 2, 1]
#---

2
非常出色的解释方式..没有@senderle的回答我无法理解你的回答,也无法理解senderle的回答! - ebram khalil

4
一些帮助理解它的方法:
1. 通过调试器逐步执行代码并观察发生了什么。 2. 或者,在纸上逐步执行它(用一个非常小的例子)并观察发生了什么。(我个人认为在纸上做这种事情更有教育意义。)
从概念上讲,它的工作原理是这样的:输入列表不断地被切成越来越小的片段,每次都将其分成两半(例如,list[:middle]是第一半)。每半都再次分成两半,直到长度小于2,即变成空或单个元素。然后通过合并例程将这些单个片段重新组合在一起,通过将两个子列表附加或交错插入到result列表中,从而获得排序后的列表。因为这两个子列表必须是已经排序的,所以附加/交错插入是一个快速(O(n))操作。
我的看法是,这个问题的关键不在于合并例程,一旦您了解到输入始终是排序的,那就很显而易见了。 "诀窍"(我用引号是因为它不是诀窍,而是计算机科学:-))在于为了确保合并的输入已排序,您必须继续递归下去,直到达到必须排序的列表,这就是为什么您要不断地调用mergesort,直到列表长度小于2。

当您第一次遇到递归和归并排序时,可能会感到不太清楚。您可能需要查阅一本好的算法书(例如DPV可以在线上合法免费获取),但是通过逐步执行您拥有的代码,您也可以走得很远。如果您真的想深入了解,斯坦福/ Coursera algo course 将很快再次运行,他将详细介绍合并排序。

如果你真的想理解它,读一下那本书的第二章,然后扔掉上面的代码,从头开始重新编写。说真的。

4

归并排序一直是我最喜欢的算法之一。

你从短的已排序序列开始,不断将它们按顺序合并为更大的已排序序列。如此简单。

递归部分意味着你在向后工作-从整个序列开始,对两半进行排序。每一半也被拆分,直到排序变得微不足道,当序列中只有零或一个元素时。随着递归函数返回已排序的序列变得更大,正如我在最初的描述中所说的那样。


2
一张图片胜过千言万语,一段动画胜过一万个。 查看以下动画从维基百科中获取,它将帮助您可视化归并排序算法的实际工作原理。 详细了解每个步骤的详细说明的动画可以在此处查看:https://www.hackerearth.com/practice/algorithms/sorting/merge-sort/visualize/。 另外,还有一个有趣的动画介绍各种类型的排序算法,可以在此处查看:https://www.toptal.com/developers/sorting-algorithms/。请注意,保留HTML标签。

0

基本上,你会得到一个列表,然后将其拆分并排序,但是你会递归地应用此方法,因此你会一遍又一遍地拆分它,直到你有一个可以轻松排序的微不足道的集合,然后合并所有简单的解决方案以获得完全排序的数组。


但是你如何实际排序“平凡集”,以及如何“合并简单的解决方案”?这并没有解释一个__归并排序__的合并或排序,因此并没有回答这个问题。 - agf

0

0

正如维基百科文章所解释的那样,有许多宝贵的方法可以实现归并排序。完成合并的方法也取决于要合并的东西集合,某些集合使得该集合拥有的某些工具能够使用。

我不打算使用 Python 来回答这个问题,因为我无法写出来;但是,“合并排序”算法的一部分似乎真的是这个问题的关键。一个帮助我的资源是 K.I.T.E 的有点过时的网页(由一位教授编写),因为内容的作者消除了上下文有意义的标识符。

我的答案来自这个资源。

记住,归并排序算法通过拆分提供的集合,然后将每个单独的部分重新组合起来,当重新构建集合时,将这些部分相互比较。

这里是“代码”(最后附上 Java "fiddle"):

public class MergeSort {

/**
 * @param a     the array to divide
 * @param low   the low INDEX of the array
 * @param high  the high INDEX of the array
 */
public void divide (int[] a, int low, int high, String hilo) {


    /* The if statement, here, determines whether the array has at least two elements (more than one element). The
     * "low" and "high" variables are derived from the bounds of the array "a". So, at the first call, this if 
     * statement will evaluate to true; however, as we continue to divide the array and derive our bounds from the 
     * continually divided array, our bounds will become smaller until we can no longer divide our array (the array 
     * has one element). At this point, the "low" (beginning) and "high" (end) will be the same. And further calls 
     * to the method will immediately return. 
     * 
     * Upon return of control, the call stack is traversed, upward, and the subsequent calls to merge are made as each 
     * merge-eligible call to divide() resolves
     */
    if (low < high) {
        String source = hilo;
        // We now know that we can further divide our array into two equal parts, so we continue to prepare for the division 
        // of the array. REMEMBER, as we progress in the divide function, we are dealing with indexes (positions)

        /* Though the next statement is simple arithmetic, understanding the logic of the statement is integral. Remember, 
         * at this juncture, we know that the array has more than one element; therefore, we want to find the middle of the 
         * array so that we can continue to "divide and conquer" the remaining elements. When two elements are left, the
         * result of the evaluation will be "1". And the element in the first position [0] will be taken as one array and the
         * element at the remaining position [1] will be taken as another, separate array.
         */
        int middle = (low + high) / 2;

        divide(a, low, middle, "low");
        divide(a, middle + 1, high, "high");


        /* Remember, this is only called by those recursive iterations where the if statement evaluated to true. 
         * The call to merge() is only resolved after program control has been handed back to the calling method. 
         */
        merge(a, low, middle, high, source);
    }
}


public void merge (int a[], int low, int middle, int high, String source) {
// Merge, here, is not driven by tiny, "instantiated" sub-arrays. Rather, merge is driven by the indexes of the 
// values in the starting array, itself. Remember, we are organizing the array, itself, and are (obviously
// using the values contained within it. These indexes, as you will see, are all we need to complete the sort.  

    /* Using the respective indexes, we figure out how many elements are contained in each half. In this 
     * implementation, we will always have a half as the only way that merge can be called is if two
     * or more elements of the array are in question. We also create to "temporary" arrays for the 
     * storage of the larger array's elements so we can "play" with them and not propogate our 
     * changes until we are done. 
     */
    int first_half_element_no       = middle - low + 1;
    int second_half_element_no      = high - middle;
    int[] first_half                = new int[first_half_element_no];
    int[] second_half               = new int[second_half_element_no];

    // Here, we extract the elements. 
    for (int i = 0; i < first_half_element_no; i++) {  
        first_half[i] = a[low + i]; 
    }

    for (int i = 0; i < second_half_element_no; i++) {  
        second_half[i] = a[middle + i + 1]; // extract the elements from a
    }

    int current_first_half_index = 0;
    int current_second_half_index = 0;
    int k = low;


    while (current_first_half_index < first_half_element_no || current_second_half_index < second_half_element_no) {

        if (current_first_half_index >= first_half_element_no) {
            a[k++] = second_half[current_second_half_index++];
            continue;
        }

        if (current_second_half_index >= second_half_element_no) {
            a[k++] = first_half[current_first_half_index++];
            continue;
        }

        if (first_half[current_first_half_index] < second_half[current_second_half_index]) {
            a[k++] = first_half[current_first_half_index++];
        } else {
            a[k++] = second_half[current_second_half_index++];
        }
    }
}

我也有一个版本,在这里,它会打印出有用的信息,并对上面的情况提供更直观的表示。如果这有帮助的话,语法高亮效果也更好。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接