多重集合是一个集合,其中所有元素可能不唯一。如何枚举集合元素之间的所有可能排列?
多重集合是一个集合,其中所有元素可能不唯一。如何枚举集合元素之间的所有可能排列?
生成所有可能的排列,然后丢弃重复的排列是非常低效的。存在各种算法可以直接按字典顺序或其他类型的顺序生成多重集合的排列。高冈算法是一个很好的例子,但可能Aaron Williams的算法更好。
http://webhome.csc.uvic.ca/~haron/CoolMulti.pdf
此外,它已经在R包“multicool”中实现。
顺便说一下,如果您只想要不同排列的总数,答案是多项式系数: 例如,如果您有n_a个元素'a',n_b个元素'b',n_c个元素'c',则不同排列的总数为(n_a+n_b+n_c)!/(n_a!n_b!n_c!)
sympy
提供了 multiset_permutations
方法。
来自文档:
>>> from sympy.utilities.iterables import multiset_permutations
>>> from sympy import factorial
>>> [''.join(i) for i in multiset_permutations('aab')]
['aab', 'aba', 'baa']
>>> factorial(len('banana'))
720
>>> sum(1 for _ in multiset_permutations('banana'))
60
def msp(items):
'''Yield the permutations of `items` where items is either a list
of integers representing the actual items or a list of hashable items.
The output are the unique permutations of the items given as a list
of integers 0, ..., n-1 that represent the n unique elements in
`items`.
Examples
========
>>> for i in msp('xoxox'):
... print(i)
[1, 1, 1, 0, 0]
[0, 1, 1, 1, 0]
[1, 0, 1, 1, 0]
[1, 1, 0, 1, 0]
[0, 1, 1, 0, 1]
[1, 0, 1, 0, 1]
[0, 1, 0, 1, 1]
[0, 0, 1, 1, 1]
[1, 0, 0, 1, 1]
[1, 1, 0, 0, 1]
Reference: "An O(1) Time Algorithm for Generating Multiset Permutations", Tadao Takaoka
https://pdfs.semanticscholar.org/83b2/6f222e8648a7a0599309a40af21837a0264b.pdf
'''
def visit(head):
(rv, j) = ([], head)
for i in range(N):
(dat, j) = E[j]
rv.append(dat)
return rv
u = list(set(items))
E = list(reversed(sorted([u.index(i) for i in items])))
N = len(E)
# put E into linked-list format
(val, nxt) = (0, 1)
for i in range(N):
E[i] = [E[i], i + 1]
E[-1][nxt] = None
head = 0
afteri = N - 1
i = afteri - 1
yield visit(head)
while E[afteri][nxt] is not None or E[afteri][val] < E[head][val]:
j = E[afteri][nxt] # added to algorithm for clarity
if j is not None and E[i][val] >= E[j][val]:
beforek = afteri
else:
beforek = i
k = E[beforek][nxt]
E[beforek][nxt] = E[k][nxt]
E[k][nxt] = head
if E[k][val] < E[head][val]:
i = k
afteri = E[i][nxt]
head = k
yield visit(head)
优化smichr的答案,我解压了nxt
,使用accumulate()
使访问函数更加高效(map()
比列表推导更快,而且在第二个常量索引中嵌套它似乎是肤浅和学究式的)
from itertools import accumulate
def msp(items):
def visit(head):
'''(rv, j) = ([], head)
for i in range(N):
(dat, j) = E[j]
rv.append(dat)
return(rv)'''
#print(reduce(lambda e,dontCare: (e[0]+[E[e[1]]],nxts[e[1]]),range(N),([],head))[0])
#print(list(map(E.__getitem__,accumulate(range(N-1),lambda e,N: nxts[e],initial=head))))
return(list(map(E.__getitem__,accumulate(range(N-1),lambda e,N: nxts[e],initial=head))))
u=list(set(items))
E=list(sorted(map(u.index,items)))
N=len(E)
nxts=list(range(1,N))+[None]
head=0
i,ai,aai=N-3,N-2,N-1
yield(visit(head))
while aai!=None or E[ai]>E[head]:
beforek=(i if aai==None or E[i]>E[aai] else ai)
k=nxts[beforek]
if E[k]>E[head]:
i=k
nxts[beforek],nxts[k],head = nxts[k],head,k
ai=nxts[i]
aai=nxts[ai]
yield(visit(head))
这里是测试结果(第二个由于更多的multiset,我想有(13!/2!/3!/3!/4!)/10! = 143/144
倍的排列组合,但需要更长时间),我的似乎分别快了9%和7%:
cProfile.run("list(msp(list(range(10))))")
cProfile.run("list(msp([0,1,1,2,2,2,3,3,3,3,4,4,4]))")
original:
43545617 function calls in 28.452 seconds
54054020 function calls in 32.469 seconds
modification:
39916806 function calls in 26.067 seconds
50450406 function calls in 30.384 seconds
我声望不够,在评论回答时无法发表意见,但对于一个items
输入列表,Martin Böschen的答案时间复杂度是每个元素值的实例数的阶乘之积加一。
reduce(int.__mul__,map(lambda n: reduce(int.__mul__,range(1,n+1)),map(items.count,set(items))))
当计算具有许多出现次数的大型多重集时,这可能会迅速增长。例如,对于我的第二个示例,每个排列所需的时间比第一个示例长1728倍。
import itertools as it
for i in it.permutations([1,2,2]):
print i
然后你将会得到输出
(1, 2, 2)
(1, 2, 2)
(2, 1, 2)
(2, 2, 1)
(2, 1, 2)
(2, 2, 1)
import itertools as it
permset=set([i for i in it.permutations([1,2,2])])
for x in permset:
print x
输出:
(1, 2, 2)
(2, 2, 1)
(2, 1, 2)