一个由元组列表组成的唯一组合

11

给定一个三元组列表,例如:[(1,2,3), (4,5,6), (7,8,9)]如何计算所有可能的组合和子集的组合?

在这种情况下,结果应该是这样的:

[
(1), (1,4), (1,5), (1,6), (1,7), (1,8), (1,9), (1,4,7), (1,4,8), (1,4,9), (1,5,7), (1,5,8), (1,5,9), (1,6,7), (1,6,8), (1,6,9),
(2), ...,
(3), ...,
(4), (4,7), (4,8), (4,9), 
(5), (5,7), (5,8), (5,9), 
(6), (6,7), (6,8), (6,9), 
(7), (8), (9)
]
  • 所有元素相同的元组被视为相同
  • 不能使用来自相同元组的组合(例如,这些不应包含在解决方案中:(1,2)(4,6)(7,8,9)

但是,如果第二条规则不允许使用(1,2),为什么(1)(9)是解决方案的一部分呢? - Drey
看起来有三组元组:1)[(x,) for x in the_list[0]],2)[(x,y) for x in the_list[0] for y in the_list[1]],以及3)[(x,y,z) for x in the_list[0] for y in the_list[1] for z in the_list[2]] - chepner
可能是Picking unordered combinations from pools with overlap的重复问题。 - Joseph Wood
5个回答

10

你可以在生成器中使用递归:

data = [(1,2,3), (4,5,6), (7,8,9)]
def combos(d, c = []):
   if len(c) == len(d):
     yield c
   else:
     for i in d:
        if i not in c:
           yield from combos(d, c+[i])

def product(d, c = []):
  if c:
    yield tuple(c)
  if d:
    for i in d[0]:
      yield from product(d[1:], c+[i])

result = sorted({i for b in combos(data) for i in product(b)})
final_result = [a for i, a in enumerate(result) if all(len(c) != len(a) or len(set(c)&set(a)) != len(a) for c in result[:i])]

输出:

[(1,), (1, 4), (1, 4, 7), (1, 4, 8), (1, 4, 9), (1, 5), (1, 5, 7), (1, 5, 8), (1, 5, 9), (1, 6), (1, 6, 7), (1, 6, 8), (1, 6, 9), (1, 7), (1, 8), (1, 9), (2,), (2, 4), (2, 4, 7), (2, 4, 8), (2, 4, 9), (2, 5), (2, 5, 7), (2, 5, 8), (2, 5, 9), (2, 6), (2, 6, 7), (2, 6, 8), (2, 6, 9), (2, 7), (2, 8), (2, 9), (3,), (3, 4), (3, 4, 7), (3, 4, 8), (3, 4, 9), (3, 5), (3, 5, 7), (3, 5, 8), (3, 5, 9), (3, 6), (3, 6, 7), (3, 6, 8), (3, 6, 9), (3, 7), (3, 8), (3, 9), (4,), (4, 7), (4, 8), (4, 9), (5,), (5, 7), (5, 8), (5, 9), (6,), (6, 7), (6, 8), (6, 9), (7,), (8,), (9,)]

6
这里是一种非递归解决方案,使用简单的for循环。通过将set应用于输出元组列表来强制执行唯一性。
lsts = [(1,2,3), (4,5,6), (7,8,9)]

res = [[]]
for lst in lsts:
    res += [(*r, x) for r in res for x in lst]

# print({tuple(lst) for lst in res[1:]})
# {(5, 9), (4, 7), (6, 9), (1, 4, 7), (2, 6, 9), (4, 8), (3, 4, 7), (2,
# 8), (2, 6, 8), (9,), (2, 5, 8), (1, 6), (3, 6, 8), (2, 5, 9), (3, 5,
# 9), (3, 7), (2, 5), (3, 6, 9), (5, 8), (1, 6, 8), (3, 5, 8), (2, 6,
# 7), (4, 9), (6, 7), (1,), (2, 9), (1, 6, 9), (3,), (1, 5), (5,), (3,
# 6), (7,), (3, 6, 7), (1, 5, 9), (2, 6), (2, 4, 7), (1, 5, 8), (3, 4,
# 8), (8,), (3, 4, 9), (1, 4), (1, 6, 7), (3, 9), (1, 9), (2, 5, 7), (3,
# 5), (2, 7), (2, 4, 9), (6, 8), (1, 5, 7), (2,), (2, 4, 8), (5, 7), (1,
# 4, 8), (3, 5, 7), (4,), (3, 8), (1, 8), (1, 4, 9), (6,), (1, 7), (3,
# 4), (2, 4)}

太棒了!仅三行纯Python代码,无需导入任何内容,迄今为止最快的解决方案,目前也是唯一包含空集(应该是解决方案的一部分,我个人认为)的解决方案。 - wovano
顺便提一下:您提到通过应用“set”来强制唯一性,但我认为列表已经正确(即仅包含唯一值,没有重复项)。 - wovano

3

使用 itertools:

import itertools as it

def all_combinations(groups):
    result = set()
    for prod in it.product(*groups):
        for length in range(1, len(groups) + 1): 
            result.update(it.combinations(prod, length))
    return result

all_combinations([(1,2,3), (4,5,6), (7,8,9)])

1
另一个版本:
from itertools import product, combinations

lst = [(1,2,3), (4,5,6), (7,8,9)]

def generate(lst):
    for idx in range(len(lst)):
        for val in lst[idx]:
            yield (val,)
            for j in range(1, len(lst)):
                for c in combinations(lst[idx+1:], j):
                    yield from tuple((val,) + i for i in product(*c))

l = [*generate(lst)]
print(l)

输出:

[(1,), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (1, 4, 7), (1, 4, 8), (1, 4, 9), (1, 5, 7), (1, 5, 8), (1, 5, 9), (1, 6, 7), (1, 6, 8), (1, 6, 9), (2,), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (2, 4, 7), (2, 4, 8), (2, 4, 9), (2, 5, 7), (2, 5, 8), (2, 5, 9), (2, 6, 7), (2, 6, 8), (2, 6, 9), (3,), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (3, 4, 7), (3, 4, 8), (3, 4, 9), (3, 5, 7), (3, 5, 8), (3, 5, 9), (3, 6, 7), (3, 6, 8), (3, 6, 9), (4,), (4, 7), (4, 8), (4, 9), (5,), (5, 7), (5, 8), (5, 9), (6,), (6, 7), (6, 8), (6, 9), (7,), (8,), (9,)]

1

感谢 @wovano 澄清规则2。这使得解决方案更加简短:

from itertools import chain, combinations, product

blubb = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

set_combos = chain.from_iterable(combinations(blubb, i) for i in range(len(blubb) + 1))
result_func = list(chain.from_iterable(map(lambda x: product(*x), set_combos)))

作为额外的奖励,我们还进行了速度比较。@hilberts_drinking_problem的解决方案非常棒,但存在一些开销。
def pure_python(list_of_tuples):
    res = [tuple()]
    for lst in list_of_tuples:
        res += [(*r, x) for r in res for x in lst]
    return res


def with_itertools(list_of_tuples):
    set_combos = chain.from_iterable(combinations(list_of_tuples, i) for i in range(len(list_of_tuples) + 1))
    return list(chain.from_iterable(map(lambda x: product(*x), set_combos)))


assert sorted(with_itertools(blubb), key=str) == sorted(pure_python(blubb), key=str)

两者计算的是相同的内容,但是...

%timeit with_itertools(blubb)
7.18 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit pure_python(blubb)
10.5 µs ± 46 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


尽管对于小样本,itertools 稍微快一些,但是当集合增大时,差距就很大了。
import toolz
large_blubb = list(toolz.partition_all(3, range(3*10)))
assert sorted(with_itertools(large_blubb), key=str) == sorted(pure_python(large_blubb), key=str)

%timeit with_itertools(large_blubb)
106 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit pure_python(large_blubb)
262 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

这使得itertools的解决方案快了2.5倍。


编辑:根据规则2进行了更正,加入了速度比较。 编辑:修复了另一个错误 - 现在速度比较是真实的。


看起来你没有完全理解问题。请再次检查最后一个要求:“_不能使用相同元组派生的组合_”。这意味着你应该从(1,2,3)中选择0或1个元素,从(4,5,6)中选择0或1个元素,以及从(7,8,9)中选择0或1个元素。可以参考其他答案的输出结果。你的解决方案也返回了无效的组合,例如(1,2,4) - wovano
感谢@wovano的解释!我编辑了答案。现在它应该按照规则运行。 - Drey
不错的改进!! 我会将第三行修改为 set_combos = chain.from_iterable(combinations(blubb, i) for i in range(len(blubb) + 1)),这样它就可以适用于任意长度的输入,而不是固定长度为3。它还会添加空列表,因此您无需手动添加(例如,删除下一行的 [[]] +)。 您的解决方案对于大型输入集更快,做得好 :-) - wovano
谢谢您的建议。如果我理解正确的话,我使用了len(blubb[0])+1而不是len(blubb)+1 - Drey
不,它确实必须是len(blubb)而不是len(blubb[0]),因为您正在创建集合的组合(数字集合,而不是Python中的set),因此必须指定集合的数量,即len(blubb)。注:为什么要使用blubb这个名称?一个更具描述性的名称可能会有所帮助;-) - wovano
哦,是的。确实在 with_itertools 中有一个错误。它仍然更快,但不是数量级的差距。 - Drey

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接