Heap's 算法排列生成器

Question

Heap's 算法排列生成器

pythonalgorithmpermutationheaps-algorithm

16

我需要迭代整数元组的排列。顺序必须通过每步交换一对元素来生成。

我找到了Heap算法的维基百科文章（http://en.wikipedia.org/wiki/Heap%27s_algorithm），该算法应该可以做到这一点。伪代码如下：

procedure generate(n : integer, A : array of any):
    if n = 1 then
        output(A)
    else
        for i := 1; i ≤ n; i += 1 do
            generate(n - 1, A)
            if n is odd then
                j ← 1
            else
                j ← i
            swap(A[j], A[n])

我尝试在Python中编写了一个生成器：

def heap_perm(A):
    n = len(A)
    Alist = [el for el in A]
    for hp in _heap_perm_(n, Alist):
        yield hp


def _heap_perm_(n, A):
    if n == 1:
        yield A
    else:
        for i in range(1,n+1):
            for hp in _heap_perm_(n-1, A):
                yield hp
            if (n % 2) == 1:
                j = 1
            else:
                j = i
            swap = A[j-1]
            A[j-1] = A[n-1]
            A[n-1] = swap

这将按以下顺序生成排列（对于输入[0,1,2,3]）：

0,1,2,3
1,0,2,3
2,0,1,3
0,2,1,3
1,2,0,3
2,1,0,3
3,1,2,0

and so on

在最后一步时出现了问题，这不是仅交换一对的问题。

我做错了什么？

- EvenAdam

此外，在Python中，swap(A[j], A[n])可以简单地实现为A[n]，A[j] = A[j]，A[n]，这比使用临时变量进行多步操作更清晰。 - aruisdante

最后，为什么itertools.permutations()不符合要求呢？对于你的输入示例，itertools.permutations([0,1,2,3])会产生正确的输出样本。由于它是用C实现的，对于大型输入序列来说，速度会快得多。 - aruisdante

@aruisdante：itertools.permutations 每一步并不会交换一对元素。 - matiasg

@MartijnPieters 我这样做是为了尽可能地使其像伪代码一样进行故障排除。我是否不一致？ - EvenAdam

@MartijnPieters，您能否再解释一下？我同意我使用的索引方式很不寻常，但我找不到不一致之处。 - EvenAdam

显示剩余5条评论

3个回答

-1

我刚刚将维基百科关于 Heap 算法的非递归伪代码解释成了 Python 代码。

A = ['a', 'b', 'c', 'd', 'e', 'f', 'g'] #the initial list to permute
n = len(A) #length of A
print(A) # output the first permutation (i.e. the original list A itself)
number_of_permutations = 1 #a variable to count a number of permutations produced by the script
total = [] #a list contaning all generated permutations
t = tuple(A) #tuple containing each permutation
total.append(t) #adding each permutation to total
c = [0] * n #c is the list used to track swapping of positions in each iteration of the Heap`s alrorythm.
i = 0 #index of positions in A and c. Serves as a pointer for swapping of positions in A.

while i < n:
    if c[i] < i: # test whether ith position was swapped more than i - 1 times
        if i % 2 == 0: #swapping the poitions in A
            A[0], A[i] = A[i], A[0]
        else:
            A[c[i]], A[i] = A[i], A[c[i]]
        print(A) #output of each permutation
        t = tuple(A) #convert each permutations into unmutable object (tuple)
        total.append(t) # appending each permutation as tuple in the list of all generated permutations
        number_of_permutations += 1 #increment number of performed permutations
        c[i] += 1 # incrementing c[i] is used to indicate that ith position was swapped
        i = 0 #returns the pointer to the 0th position (imitates recursion)
    else:
        c[i] = 0 #when ith position in A was swapped i - 1 times c[i] resets to zero
        i += 1 #pointer moves to the next position

print('Number of permutations generated by the script = ', number_of_permutations)

#Examining the correctness of permutions. Total number of permutations should be n! and there should not be any repeates
def factorial(n):
    """ Factorial function"""
    result = 1
    for number in range(1, n + 1):
        result *= number
    return result

print('Theoretical number of permutations (calculated as n!) = ', factorial(n))

for seq in total: #testing repeates
    count = total.count(seq)
    if count > 1:
        x = False
    else:
        x = True
print('The script does not generates repeats' if x == True else 'The script generates repeats')

我还介绍了一个用于检验结果正确性的测试（所有不重复排列的数量应该等于n！）。看起来脚本运行良好。我仍然不完全理解它是如何工作的，但我已经根据自己的理解在代码中添加了注释。

- Mikhail Tarasov

Python内置的math模块中有一个阶乘函数。要检查列表中的所有对象是否唯一，可以使用len(total) == len(set(total))，这比循环遍历所有元素要快得多。(O(n) vs. (O(n²))。此外，你的检查不起作用。一旦检测到失败，应该离开循环；如果继续，可能会将x重置为True。最后，实现的正确性还要求每个步骤都涉及单个交换，这一点你没有检查。 - rici

-4

获取列表的排列组合最简单的方法是使用itertools模块中的permutations函数。因此，如果算法不受限制，我会选择这种方法：

from itertools import permutations

a = [1,2,3,4]
for item in permutations(a):
    print item

- Saksham Varma

你忽略了一个交换的限制条件。 - Paddy3118

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- rici · Accepted Answer

历史序言

自从这个答案被写出来以来，关于Heap's算法的维基百科文章已经多次被更正、污损和修复。提问者和原回答所参考的版本是错误的；您可以在维基百科的历史记录中看到。当前版本可能正确，也可能不正确；在此编辑时（2022年3月），该页面包含了正确和错误的版本。

如果您打算实现维基百科伪代码，则您的代码（在算法上）没有问题。您已经成功地实现了所呈现的算法。

然而，所呈现的算法并不是Heap's算法，并且它不能保证连续的排列是单个交换的结果。正如您在维基百科页面中所看到的，有时会在生成的排列之间发生多个交换。例如，参见以下行：

CBAD
DBCA

它们之间有三次交换（其中一次交换是无操作）。

这正是您的代码输出的内容，这并不奇怪，因为您的代码是所呈现算法的准确实现。

Heap算法的正确实现和错误的来源

有趣的是，该伪代码基本上是从Sedgewick在2002年的演讲幻灯片中派生出来的（维基百科页面上的参考文献3，也可在Sedgewick的主页上找到）。错误的伪代码在第13张幻灯片上，很明显是错误的，因为它与直接前面的算法工作展示有用的图形不符。我做了些调查，并发现了许多此错误伪代码的副本，足以开始对我的分析产生怀疑。

幸运的是，我也看了Heap的短文（维基百科页面上的参考文献1），它相当清晰。他说的是：（重点加粗）

程序使用相同的通用方法......即对于n个对象，首先排列前（n-1）个对象，然后将第n个单元格的内容与指定单元格的内容互换。在这种方法中，如果n是奇数，则此指定单元格始终是第一个单元格，但如果n是偶数，则其编号连续从1到（n-1）。

问题在于，递归函数在返回之前总是进行交换（除非N为1）。因此，它实际上会连续从1到n进行交换，而不是像Heap规定的那样连续从1到（n-1）。因此，例如当函数使用N==3调用时，在下一次yield之前将有两个交换：一个来自N==2的调用末尾，另一个来自i==N的循环。如果N==4，则将进行三次交换，依此类推。（其中一些将是无操作。）

最后一次交换是错误的：该算法应在递归调用之间进行交换，而不是在它们之后进行交换；它永远不应与i==N进行交换。

因此，这应该可行：

def _heap_perm_(n, A):
    if n == 1: yield A
    else:
        for i in range(n-1):
            for hp in _heap_perm_(n-1, A): yield hp
            j = 0 if (n % 2) == 1 else i
            A[j],A[n-1] = A[n-1],A[j]
        for hp in _heap_perm_(n-1, A): yield hp

注意：以上函数是为了模仿原问题中同名函数的功能而编写的，该函数执行原地置换。在原地执行置换可以节约每次返回置换时的复制成本，使得完整的生成过程的时间复杂度为O(n!)（或每个生成的置换的时间复杂度为O(1)），而不是O(n·n!)（或每个生成的置换的时间复杂度为O(n)）。如果您打算立即处理每个置换，那么这很好，但如果您计划保留它们，则会变得混乱，因为下一个调用生成器将修改先前生成的列表。在这种情况下，您可能需要将函数调用为

for perm in map(tuple, heap_perm(n, A)):
    # Now each perm is independent of the previous one

或者将它们全部收集到一个巨大的列表中（不建议这样做！；这些列表非常庞大），可以使用类似于perms = list(map(tuple, heap_perm(n, A)))的方法。

Sedgewick的原始论文

当我发现Sedgewick 1977年的论文时[见注1]，我很高兴发现他在那里提出的算法是正确的。然而，它使用了一个循环控制结构（归功于Donald Knuth），可能对Python或C程序员来说似乎陌生：一个中间循环测试：

Algorithm 1:

  procedure permutations (N);
      begin c:=1;
          loop:
              if N>2 then permutations(N-1)
              endif;
          while c<N:
              # Sedgewick uses :=: as the swap operator
              # Also see note below
              if N odd then P[N]:=:P[1] else P[N]:=:P[c] endif;
              c:=c+l
          repeat
       end;

注：交换行取自第141页，在那里Sedgewick解释了如何修改算法1的原始版本（第140页）以匹配Heap的算法。除了那一行之外，算法的其余部分没有改变。还介绍了几个变体。

注意事项：

当我撰写这篇答案时，该论文唯一的公开链接是维基百科页面上引用的需要付费的URL。现在可以在Sedgewick的主页上下载。