Python中的重复排列

Question

Python中的重复排列

6

我想要遍历大小为1的维立方体的所有顶点。我知道可以使用itertools.product来实现：

>>> n = 3
>>> for j in it.product((0,1), repeat=n) :
...     print j
... 
(0, 0, 0)
(0, 0, 1)
(0, 1, 0)
(0, 1, 1)
(1, 0, 0)
(1, 0, 1)
(1, 1, 0)
(1, 1, 1)

但我需要根据坐标中的 1 的数量，对每个顶点进行不同的处理，例如 (0, 1, 1)、(1, 0, 1) 和 (1, 1, 0) 将会接受相同的处理，因为它们都有两个 1。我不想使用上面的迭代器，然后再计算 1 的数量，我想生成按 1 的数量排序的笛卡尔积，例如：

>>> for ones in xrange(n) :
...     for seq in magic_expression(ones, n) :
...         print ones, seq
... 
0 (0, 0, 0)
1 (0, 0, 1)
1 (0, 1, 0)
1 (1, 0, 0)
2 (0, 1, 1)
2 (1, 0, 1)
2 (1, 1, 0)
3 (1, 1, 1)

我的高中数学老师会称这些为2个元素的排列组合，每次取n个元素，第一个元素重复n-ones次，第二个元素重复ones次，很容易证明共有n！/ ones！/（n-ones）！种排列组合。

根据维基百科的描述，我们可以使用类似以下的方式以字典序生成它们：

def lexicographical(ones, n) :
    perm = [0] * (n - ones) + [1] * ones
    yield tuple(perm)
    while True :
        k = None
        for j in xrange(n - 1) :
            if perm[j] < perm[j + 1] :
                k = j
        if k is None :
            break
        l = k + 1
        for j in xrange(k + 2, n) :
            if perm[k] < perm[j] :
                l = j
        perm[k], perm[l] = perm[l], perm[k]
        perm[k+1:] = perm[-1:k:-1]
        yield tuple(perm)

但是，仅在完整的笛卡尔积中计数时，这只有在n >= 10且ones < 2的情况下才会开始产生回报，而这并不是典型的用例。是否有一种优雅的方式可以加速上述代码，也许使用一些强大的itertools魔法或完全使用不同的算法？如果有任何区别，我对生成的排列顺序毫不在意。或者我应该放弃计数吗？

编辑

我对提出的解决方案进行了一些时间测量。按照itertools.product生成它们的顺序来消耗顶点，计数始终是最快的。但是，为了按照1的数量排序生成它们的顺序，Eevee对列表进行排序的解决方案在n <= 6时最快，并且从那时起，Cam的解决方案是两个解决方案中最快的。

我接受了Cam的解决方案，因为它更好地回答了问题。但就我要在我的代码中实现的内容而言，我将放弃计数。

- Jaime

5个回答

3

一种（低效的）替代方法：

>>> ['{0:03b}'.format(x) for x in range(8)]
['000', '001', '010', '011', '100', '101', '110', '111']

或者作为元组：

>>> [tuple(int(j) for j in list('{0:03b}'.format(x))) for x in range(8)]

[(0, 0, 0),
 (0, 0, 1),
 (0, 1, 0),
 (0, 1, 1),
 (1, 0, 0),
 (1, 0, 1),
 (1, 1, 0),
 (1, 1, 1)]

按顶点数量排序：

>>> sorted(_, key=lambda x: sum(x))

[(0, 0, 0),
 (0, 0, 1),
 (0, 1, 0),
 (1, 0, 0),
 (0, 1, 1),
 (1, 0, 1),
 (1, 1, 0),
 (1, 1, 1)]

或者使用itertools：

>>> sorted(itertools.product((0, 1), repeat=3), key=lambda x: sum(x))

[(0, 0, 0),
 (0, 0, 1),
 (0, 1, 0),
 (1, 0, 0),
 (0, 1, 1),
 (1, 0, 1),
 (1, 1, 0),
 (1, 1, 1)]

- Alex L

2

针对你的3D立方体使用场景，Eevee的解决方案是正确的。

然而，为了展示itertools的强大之处，这里提供一种线性时间解决方案，适用于更高维度的情况：

from itertools import combinations

# n is the number of dimensions of the cube (3 for a 3d cube)
def generate_vertices(n):
    for number_of_ones in xrange(0, n + 1):
        for location_of_ones in combinations(xrange(0, n), number_of_ones):
            result = [0] * n
            for location in location_of_ones:
                result[location] = 1
            yield result

for vertex in generate_vertices(3):
    print vertex


# result:
# [0, 0, 0]
# [1, 0, 0]
# [0, 1, 0]
# [0, 0, 1]
# [1, 1, 0]
# [1, 0, 1]
# [0, 1, 1]
# [1, 1, 1]

- Cam

不错！这就是我在寻找的东西。set 操作是不是没必要？ - Jaime

@Jaime：不，因为集合具有常数时间访问，这是下面部分所需的。但是你提出了一个很好的观点，即该部分有点冗长。我已经更新了它，使其更清晰，可能也更快一些。 - Cam

以下是关于编程的内容，翻译成中文。仅返回翻译后的文本：供参考，这是我原来的解决方案:https://gist.github.com/4536013 - Cam

1

根据顶点的用途进行计数并不是一个坏主意，因为如果您必须迭代所有顶点并执行某些操作，则O(f(n))至少为O(f(n)*2n)，对它们进行排序是O(n*2n)。因此，这基本上取决于f(n)是否大于n。

除此之外，这里有一个可能的神奇表达式：

def magic_expression(ones, n):
    a = (0,) * (n - ones) + (1,) * ones
    previous = tuple()
    for p in itertools.permutations(a):
        if p > previous:
            previous = p
            yield p

借助具有唯一值的排列的帮助。

这是有效的，因为itertools.permutations产生排序后的结果。请注意，a最初是排序的，因为零首先出现。

- Jan Segre

这并没有解决问题。它要求一种按照立方体顶点中的1的数量排序迭代的方法 - 这并不是那样做的。 - Cam

虽然这不是最好的解决方案，但它可以完成问题中所期望的魔法表达式的功能，当与 for ones in xrange(3): ... 结合使用时，将产生预期的输出。 - Jan Segre

哎呀！确实是这样。谢谢你澄清了。 - Cam

1

以下是一些比Cam或Eevee的代码运行速度更快（对于中等n）且多次更快（对于大型n）。接下来是一个时间比较。

def cornersjc (n):   # Re: jw code
    from itertools import product
    m = (n+1)/2
    k = n-m
    # produce list g of lists of tuples on k bits
    g = [[] for i in range(k+1)]
    for j in product((0,1), repeat=k):
        g[sum(j)].append(tuple(j))
    # produce list h of lists of tuples on m bits
    if k==m:
        h = g
    else:
        h = [[] for i in range(m+1)]
        for j in product((0,1), repeat=m):
            h[sum(j)].append(tuple(j))
    # Now deliver n-tuples in proper order
    for b in range(n+1):  # Deliver tuples with b bits set
        for lb in range(max(0, b-m), min(b+1,k+1)):
            for l in g[lb]:
                for r in h[b-lb]:
                    yield l+r

下面的时间结果来自于ipython中一系列的%timeit调用。每个调用都是以下形式之一：
%timeit [x for x in cube1s.f(n)]
其中f的位置分别为cornersjc，cornerscc，cornersec，cornerses（代表我的代码，Cam的代码，Eevee的代码和我对Eevee方法的版本），n的位置为一个数字。

n    cornersjc    cornerscc    cornersec    cornerses

5      40.3 us      45.1 us      36.4 us      25.2 us    
6      51.3 us      85.2 us      77.6 us      46.9 us    
7      87.8 us      163 us       156 us       88.4 us    
8     132 us       349 us       327 us       178 us    
9     250 us       701 us       688 us       376 us    
10    437 us      1.43 ms      1.45 ms       783 us
11    873 us      3 ms         3.26 ms      1.63 ms
12   1.87 ms      6.66 ms      8.34 ms      4.9 ms

上面给出了cornersjc的代码。 cornerscc，cornersec和cornerses的代码如下。这些代码产生与cornersjc相同的输出，但是Cam的代码生成一个列表的列表，而不是元组的列表，并且在每个位计数组内部以相反的顺序生成。

def cornerscc(n):   # Re: Cam's code
    from itertools import combinations
    for number_of_ones in xrange(0, n + 1):
        for location_of_ones in combinations(xrange(0, n), number_of_ones):
            result = [0] * n
            for location in location_of_ones:
                result[location] = 1
            yield result

def cornersec (n):   # Re:  Eevee's code
    from itertools import product
    vertices = ((v.count(1), v)
                for v in product((0, 1), repeat=n))
    for count, vertex in sorted(vertices):
        yield vertex

def cornerses (n):   # jw mod. of Eevee's code
    from itertools import product
    for vertex in sorted(product((0, 1), repeat=n), key=sum):
        yield vertex

请注意，cornersjc 的最后三行可以替换为：

            for v in product(g[lb], h[b-lb]):
                yield v[0]+v[1]

这个版本更干净但速度较慢。请注意，如果使用yield v而不是yield v[0]+v[1]，则代码运行速度比cornersjc快，但（在n=5时）会产生类似((1, 0), (1, 1, 0))的元组对结果；当使用yield v[0]+v[1]时，代码运行速度比cornersjc慢，但产生相同的结果，即元组列表如(1, 0, 1, 1, 0)。下面是一个示例计时，其中cornersjp是修改后的cornersjc。

In [93]: for n in range(5,13):
    %timeit [x for x in cube1s.cornersjp(n)]
   ....:     
10000 loops, best of 3: 49.3 us per loop
10000 loops, best of 3: 64.9 us per loop
10000 loops, best of 3: 117 us per loop
10000 loops, best of 3: 178 us per loop
1000 loops, best of 3: 351 us per loop
1000 loops, best of 3: 606 us per loop
1000 loops, best of 3: 1.28 ms per loop
100 loops, best of 3: 2.74 ms per loop

- James Waldby - jwpat7

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Eevee · Accepted Answer

如果你写了超过八行的代码来生成八个常量值，那么一定有问题。

除非我只是嵌入我想要的列表，否则我会以愚蠢的方式解决：

vertices = (
    (v.count(1), v)
    for v in itertools.product((0, 1), repeat=3)
)
for count, vertex in sorted(vertices):
    print vertex

除非你正在处理1000维的超立方体，否则你不必担心性能问题。