itertools的product函数不应包含具有重复值的组合

Question

itertools的product函数不应包含具有重复值的组合

5

我正在尝试创建组合。示例代码如下：

a = [1, 2, 3], [1, 4, 5]
combinations = list(itertools.product(*a))

输出：

[(1, 1), (1, 4), (1, 5), (2, 1), (2, 4), (2, 5), (3, 1), (3, 4), (3, 5)]

我不需要组合(1,1)。我已经尝试了以下代码：

for comb in combinations:
    if comb[0] == comb[1]:
        combinations.remove(comb)

但是由于我需要在大数据上执行此操作。这将花费太多的时间。

另外，组合中的元素应该等于列表中的项数。例如：a = [1,2,3], [2,3,7],[4,5,1] 每个组合中的元素都应该是3个，如(1,2,4)

请建议一种避免这种组合的方法。

- panr

2

你只想要避免(1, 1, 1)，还是也包括(1, 1, 2)和(1, 2, 1)？ - poke

@poke：需要考虑仅包含唯一值的组合。因此，(1,1,2)和(1,2,1)应该被丢弃。 - panr

4个回答

4

如果你的列表中有两个以上的子列表，你可以比较元组的大小和将其转换为set后的大小来过滤重复元素。

>>> import itertools
>>> b = [1,2,3],[1,4,5],[1,2,6]
>>> [x for x in itertools.product(*b) if len(x) == len(set(x))]
[(1, 4, 2), (1, 4, 6), (1, 5, 2), (1, 5, 6), 
 (2, 1, 6), (2, 4, 1), (2, 4, 6), (2, 5, 1), (2, 5, 6), 
 (3, 1, 2), (3, 1, 6), (3, 4, 1), (3, 4, 2), (3, 4, 6), (3, 5, 1), (3, 5, 2), (3, 5, 6)]

它的工作效果很好，但在列表数量超过20个时需要花费太长时间。[...] 列表的最大数量将为40，每个列表中的项最多可达20个。

这些列表非常多且非常大。即使是20个列表，每个列表有3个元素，您也必须遍历3^20 = 3,486,784,401种组合。虽然这仍然可行，但您应该使用生成器表达式，而不是列表推导，即使用(...)而不是[...]：

gen = (x for x in itertools.product(*b) if len(x) == len(set(x)))
for x in gen:
    # do stuff

对于40个包含20个元素的列表，您将获得10^52种组合。在您生成所有这些组合之前，宇宙可能会死亡。假设大多数（实际上几乎全部）都包含重复项，您可以尝试更聪明的算法，一旦遇到第一个重复项，就跳过整个“分支”的组合，但我怀疑即使那样也不会有太大帮助。

- tobias_k

它的功能正常，但是当列表数量超过20个时，花费时间太长。 - panr

你的数据集有多大？如果超过20个，你计算过期望的组合数吗？例如，对于每个列表3个项目，将会有3^20即3,486,784,401种组合...无论你采用何种方法，这都需要一些时间。 - tobias_k

列表的最大数量为40，每个列表中的项目最多可以达到20个。 - panr

40个包含20个元素的列表，这将产生10^52种组合。祝你好运。 - tobias_k

0

以下是三种方法来实现这个

a = [1,2,3],[1,4,5]

filter(lambda x: x[0] != x[1], [(x,y) for x in a[0] for y in a[1]])

combinations = list(itertools.product(*a))
[value for value in combinations if value[0] != value[1]]

filter(lambda x:x[0] != x[1], combinations)

过滤(lambda x: x[0] != x[1], [(x,y) for x in a[0] for y in a[1]])

组合 = 列表(itertools.product(*a))
[value for value in 组合 if value[0] != value[1]]

```
过滤(lambda x:x[0] != x[1], 组合)
```

- Rakesh Kumar

-1

请将代码的第二部分修改为以下内容：

for  comb in combinations:
    if comb[0] == comb[1]:
        combinations.remove(comb)

- Larry Clifford

在迭代列表时删除其中的元素是一个不好的主意。 - Alon Gouldman

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Eugene Yarmash · Accepted Answer

对于两个可迭代对象，可以使用简单的list comprehension：

>>> from itertools import product
>>> a = [1, 2, 3], [1, 4, 5]
>>> [(x, y) for x, y in product(*a) if x != y]
[(1, 4), (1, 5), (2, 1), (2, 4), (2, 5), (3, 1), (3, 4), (3, 5)]

如果您需要过滤任意数量的可迭代产品，则最好使用集合来检查组合中的所有元素是否都不同：

>>> a = [1, 2, 3], [1, 4, 5], [1, 8, 9]
>>> [p for p in product(*a) if len(set(p)) == len(p)]
[(1, 4, 8), (1, 4, 9), (1, 5, 8), (1, 5, 9), (2, 1, 8), (2, 1, 9), (2, 4, 1), (2, 4, 8), (2, 4, 9), (2, 5, 1), (2, 5, 8), (2, 5, 9), (3, 1, 8), (3, 1, 9), (3, 4, 1), (3, 4, 8), (3, 4, 9), (3, 5, 1), (3, 5, 8), (3, 5, 9)]

顺便提一下，千万不要修改你正在循环的列表，因为这很可能会产生错误的循环。