Python无重复组合

Question

Python无重复组合

35

我有一组数字列表，想从中生成组合。如果我的列表是：

t = [2,2,2,2,4]
c = list(itertools.combinations(t, 4))

结果是：

这是结果。

(2, 2, 2, 2)
(2, 2, 2, 4)
(2, 2, 2, 4)
(2, 2, 2, 4)
(2, 2, 2, 4)

但我想要得到：

(2, 2, 2, 2)
(2, 2, 2, 4)

除了创建新列表并遍历第一个列表之外，是否有可能消除重复项？

- GoobyPrs

4个回答

25

当Donkey Kong指向集合时，您可以通过将列表转换为集合来获取列表中的唯一值：

t = [2,2,2,2,4]
c = list(itertools.combinations(t, 4))
unq = set(c)
print(unq)

结果将会是：

{(2, 2, 2, 4), (2, 2, 2, 2)}

如果您想将其用作列表，可以通过执行以下操作将其转换回来：

result = list(unq)

另一种更简洁、全面的方法是：

t = [2,2,2,2,4]
c = set(itertools.combinations(t, 4))

- Randhawa

3

只需将itertools.combinations(t, 4)传入set()即可。 - miradulo

是的，那是最干净的方式。 - Randhawa

不要那样做。他为什么要先转换为列表，然后转换为集合，再转回列表呢？ - miradulo

2

@Randhawa，你的回答应该提供一份干净的修复代码... :) - Iron Fist

被踩了。保留一个创建列表=>集合=>列表的方法毫无意义。“继续他的代码”如果是错误的/不必要的，那就没有任何价值。 - miradulo

13

技术上讲，你得到的实际上不是重复项，而只是itertools.combinations的工作原理。如果你阅读链接页面中的描述，你就会明白:

itertools.combinations(iterable, r)

返回输入迭代器中长度为r的元素子序列。

组合按字典序排序。因此，如果输入迭代器已排序，则将按排序顺序产生组合元组。

根据其位置而非其值处理元素的唯一性。因此，如果输入元素是唯一的，则每个组合中不会有重复值。

演示:

>>> import itertools as it
>>> list(it.combinations([1,2,3,4,5], 4))
[(1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5), (1, 3, 4, 5), (2, 3, 4, 5)]

就像前面的回答所说，set()会给你想要的唯一值：

>>> set(it.combinations(t, 4))
{(2, 2, 2, 4), (2, 2, 2, 2)}

- Iron Fist

那么，是否有一种基于值而不是位置进行组合的方法呢？也就是说，如果列表是 [0,1,2,2]，则给出 (0,1) (0,2), (1,2)? - Just_Newbie

@铁拳。解释得很好。 - Shreyas Singh

7

这现在可以使用包more-itertools来完成，该包从版本8.7开始具有名为distinct_combinations的函数来实现这一点。

>>> from itertools import combinations
>>> t = [2,2,2,2,4]
>>> set(combinations(t, 4))
{(2, 2, 2, 2), (2, 2, 2, 4)}

>>> from more_itertools import distinct_combinations
>>> t = [2,2,2,2,4]
>>> list(distinct_combinations(t,4))
(2, 2, 2, 2), (2, 2, 2, 4)]

据我有限的测试，就性能而言，与 @hahho 编写的函数相似。

- JJR4

2

这个解决方案需要更高的位置。当n>7时，任何尝试在排列调用中使用set()的人都会遇到困难。 - NaiveBae

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- hahho · Accepted Answer

我知道这有点晚了，但是我想补充一点。

set(itertools.combinations(t, 4)) 对于大多数情况来说已经足够好了，但它仍然会在内部迭代所有重复组合，因此可能会计算量很大。特别是在实际上没有很多独特组合的情况下。

这个方法只迭代唯一的组合：

from itertools import chain, repeat, count, islice
from collections import Counter


def repeat_chain(values, counts):
    return chain.from_iterable(map(repeat, values, counts))


def unique_combinations_from_value_counts(values, counts, r):
    n = len(counts)
    indices = list(islice(repeat_chain(count(), counts), r))
    if len(indices) < r:
        return
    while True:
        yield tuple(values[i] for i in indices)
        for i, j in zip(reversed(range(r)), repeat_chain(reversed(range(n)), reversed(counts))):
            if indices[i] != j:
                break
        else:
            return
        j = indices[i] + 1
        for i, j in zip(range(i, r), repeat_chain(count(j), counts[j:])):
            indices[i] = j


def unique_combinations(iterable, r):
    values, counts = zip(*Counter(iterable).items())
    return unique_combinations_from_value_counts(values, counts, r)

使用方法：

>>> list(unique_combinations([2, 2, 2, 2, 4], 4)) # elements must be hashable
[(2, 2, 2, 2), (2, 2, 2, 4)]

# You can pass values and counts separately. For this usage, values don't need to be hashable
# Say you have ['a','b','b','c','c','c'], then since there is 1 of 'a', 2 of 'b', and 3 of 'c', you can do as follows:
>>> list(unique_combinations_from_value_counts(['a', 'b', 'c'], [1, 2, 3], 3))
[('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'c'), ('b', 'b', 'c'), ('b', 'c', 'c'), ('c', 'c', 'c')]

# unique_combinations() is a generator (and thus an iterator)
# so you can iterate it
>>> for comb in unique_combinations([2, 2, 2, 2, 4], 4):
...     print(sum(comb))
...
8   # 2+2+2+2
10  # 2+2+2+4

请注意，itertools.combinations()是用C实现的，这意味着对于大多数情况，它比我的Python脚本更快。只有在重复组合远远超过唯一组合时，此代码才比set(itertools.combinations())方法更好。