寻找最高效的成对组合

Question

寻找最高效的成对组合

pythonalgorithmcombinatorics

25

问题

我有一组人，我希望每个人都与这个组中的其他人进行1:1会面。一个人一次只能与另一个人见面，因此我想要做到以下几点：

找到所有可能的配对组合
将配对组合分组为“轮次”会议，其中每个人只能参加一次，而且每个轮次应包含尽可能多的配对以满足在最少轮次内的所有可能配对组合。

为了演示所需输入/输出的问题，请假设我有以下列表：

>>> people = ['Dave', 'Mary', 'Susan', 'John']

我想要生成以下输出：

>>> for round in make_rounds(people):
>>>     print(round)
[('Dave', 'Mary'), ('Susan', 'John')]
[('Dave', 'Susan'), ('Mary', 'John')]
[('Dave', 'John'), ('Mary', 'Susan')]

如果我有奇数个人，那么我会期望得到这个结果:

>>> people = ['Dave', 'Mary', 'Susan']
>>> for round in make_rounds(people):
>>>     print(round)
[('Dave', 'Mary')]
[('Dave', 'Susan')]
[('Mary', 'Susan')]

这个问题的关键是我需要我的解决方案具有较好的性能（在合理范围内）。我已经编写了可以使用的代码，但随着 people 的大小增长，它变得指数级缓慢。我不知道足够有效率算法的编写，无法确定我的代码是否低效，或者是否仅受问题参数的限制。

我尝试过的方法

第一步很容易：我可以使用 itertools.combinations 获取所有可能的配对：

>>> from itertools import combinations
>>> people_pairs = set(combinations(people, 2))
>>> print(people_pairs)
{('Dave', 'Mary'), ('Dave', 'Susan'), ('Dave', 'John'), ('Mary', 'Susan'), ('Mary', 'John'), ('Susan', 'John')}

为了计算轮次，我按以下方式建立轮次:

创建一个空的 round 列表
迭代使用上面的 combinations 方法计算的 people_pairs 集合的副本
对于每个人，检查当前 round 中是否存在包含该个体的任何现有配对
如果已经存在包含其中一个个体的配对，则跳过此轮的该配对。如果不存在，则将该配对添加到该轮中，并从 people_pairs 列表中删除该配对。
一旦所有人员配对都进行了迭代，就将该轮追加到主要的 rounds 列表中
再次开始，因为 people_pairs 现在仅包含未进入第一轮的配对

最终这样就能得到所需的结果，并且将我的人员配对缩减到没有剩余而所有轮次都计算出来了。我已经可以看到这需要大量迭代，但是我不知道有更好的方法。

这是我的代码：

from itertools import combinations

# test if person already exists in any pairing inside a round of pairs
def person_in_round(person, round):
    is_in_round = any(person in pair for pair in round)
    return is_in_round

def make_rounds(people):
    people_pairs = set(combinations(people, 2))
    # we will remove pairings from people_pairs whilst we build rounds, so loop as long as people_pairs is not empty
    while people_pairs:
        round = []
        # make a copy of the current state of people_pairs to iterate over safely
        for pair in set(people_pairs):
            if not person_in_round(pair[0], round) and not person_in_round(pair[1], round):
                round.append(pair)
                people_pairs.remove(pair)
        yield round

使用 https://mycurvefit.com 绘制出该方法对列表大小为100-300的性能表现，结果显示计算1000人名单的回合可能需要大约100分钟。是否有更有效的方法？

注意：我实际上并不想组织1000人的会议 :) 这只是一个简单的例子，代表我要解决的匹配/组合问题。

- daveruinseverything

6个回答

6

我只生成索引（因为我很难想出1000个名称 =），但对于1000个数字，运行时间约为4秒。

所有其他方法的主要问题在于它们使用成对对象进行操作，这些成对对象非常多，运行时间变得更长。我的方法不同之处在于与人员一起工作，而不是成对对象。我有一个将人员映射到他/她必须会面的其他人员列表的dict()，这些列表最多只有N项（而不是像成对对象那样是N^2）。因此可以节省时间。

#!/usr/bin/env python

from itertools import combinations
from collections import defaultdict

pairs = combinations( range(6), 2 )

pdict = defaultdict(list)
for p in pairs :
    pdict[p[0]].append( p[1] )

while len(pdict) :
    busy = set()
    print '-----'
    for p0 in pdict :
        if p0 in busy : continue

        for p1 in pdict[p0] :
            if p1 in busy : continue

            pdict[p0].remove( p1 )
            busy.add(p0)
            busy.add(p1)
            print (p0, p1)

            break

    # remove empty entries
    pdict = { k : v for k,v in pdict.items() if len(v) > 0 }

'''
output:
-----
(0, 1)
(2, 3)
(4, 5)
-----
(0, 2)
(1, 3)
-----
(0, 3)
(1, 2)
-----
(0, 4)
(1, 5)
-----
(0, 5)
(1, 4)
-----
(2, 4)
(3, 5)
-----
(2, 5)
(3, 4)
'''

- lenik

1

有趣的是，你可以轻松使用faker生成好的虚假数据。 - daveruinseverything

@daveruinseverything 谢谢你提供的信息。我只需要这些数据来使用 len(data) 计算其长度，所以如果我可以直接输入数字，那么获取这些数据就没有意义了。 - lenik

3

当您需要快速查找时，哈希表 / 字典是最好的选择。使用字典而不是列表跟踪每轮中谁已经出现，这样速度会更快。

由于您正在学习算法，研究大 O 表示法会对您有所帮助，了解哪种数据结构适合进行什么类型的操作也很重要。请参阅此指南：https://wiki.python.org/moin/TimeComplexity 以获取 Python 内置函数的时间复杂度。您将看到，在列表中检查项目的时间复杂度为 O(n)，这意味着它随着输入大小的增加而线性扩展。因此，如果在循环中进行此操作，则最终会得到 O(n²) 或更差的结果。对于字典，查找通常为 O(1)，这意味着它不受输入大小的影响。

此外，请勿覆盖内置功能。我建议将“round”更改为“round_”。

from itertools import combinations

# test if person already exists in any pairing inside a round of pairs
def person_in_round(person, people_dict):
    return people_dict.get(person, False)

def make_rounds(people):
    people_pairs = set(combinations(people, 2))
    people_in_round = {}
    # we will remove pairings from people_pairs whilst we build rounds, so loop as long as people_pairs is not empty
    while people_pairs:
        round_ = []
        people_dict = {}
        # make a copy of the current state of people_pairs to iterate over safely
        for pair in set(people_pairs):
            if not person_in_round(pair[0], people_dict) and not person_in_round(pair[1], people_dict):
                round_.append(pair)
                people_dict[pair[0]] = True
                people_dict[pair[1]] = True


                people_pairs.remove(pair)
        yield round_

- Zev

这个任务完成1000个大约花了4分钟。这让我稍微有点担心。 - Zev

1

谢谢你指出我使用了 round - 我甚至没有意识到！ - daveruinseverything

2

没问题，这个问题应该被用作在这个网站上提问的指南。太棒了！ - Zev

@Zev 完全同意！ - Mike Housky

3

你可以立即做两件事：

不要在每次遍历列表时都复制集合。这是一种非常浪费时间/内存的做法。相反，每次迭代后只修改一次集合。
在每一轮中保持一个单独的人员集合。在集合中查找一个人将比循环整个轮次快上一个数量级。

示例：

def make_rounds(people):
    people_pairs = set(combinations(people, 2))

    while people_pairs:
        round = set()
        people_covered = set()
        for pair in people_pairs:
            if pair[0] not in people_covered \
               and pair[1] not in people_covered:
                round.add(pair)
                people_covered.update(pair)
        people_pairs -= round # remove thi
        yield round

对比：

- soundstripe

2

也许我漏掉了什么（这并不罕见），但这听起来像是一个普通的循环赛，每个团队都恰好与其他团队比赛一次。

有O(n^2)的方法可以手动处理这个问题，在机器上也可以很好地工作。一个很好的描述可以在维基百科关于循环赛的文章中找到。

关于O(n^2)：将有n-1或n轮，每一轮需要O(n)步来旋转除一个表项以外的所有表项，并且需要O(n)步来枚举每轮中n//2场比赛。你可以使用双向链表使旋转变为O(1)，但是比赛的枚举仍然是O(n)。因此，O(n)*O(n) = O(n^2)。

- Mike Housky

@juanpa.arrivillaga 谢谢！我甚至都没想到去那里看。 - Mike Housky

1

这在我的电脑上大约需要45秒。

def make_rnds(people):
    people_pairs = set(combinations(people, 2))
    # we will remove pairings from people_pairs whilst we build rnds, so loop as long as people_pairs is not empty
    while people_pairs:
        rnd = []
        rnd_set = set()
        peeps = set(people)
        # make a copy of the current state of people_pairs to iterate over safely
        for pair in set(people_pairs):
            if pair[0] not in rnd_set and pair[1] not in rnd_set:
                rnd_set.update(pair)
                rnd.append(pair)

                peeps.remove(pair[0])
                peeps.remove(pair[1])

                people_pairs.remove(pair)
                if not peeps:
                    break
        yield rnd

我删除了函数person_in_rnd以减少函数调用所花费的时间，并添加了一个名为rnd_set和peeps的变量。rnd_set是迄今为止参加轮次的每个人的集合，用于检查与该对的匹配情况。peeps是人的一个复制集合，每次我们向rnd添加一对时，就会从peeps中删除那些个体。这使我们可以在peeps为空时停止迭代所有组合，也就是说，一旦每个人都参加了一轮。

- Sam Kowalski

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Steven Rumbalski · Accepted Answer

这是维基百科文章轮流比赛赛制中描述的算法的实现。

from itertools import cycle , islice, chain

def round_robin(iterable):
    items = list(iterable)
    if len(items) % 2 != 0:
        items.append(None)
    fixed = items[:1]
    cyclers = cycle(items[1:])
    rounds = len(items) - 1
    npairs = len(items) // 2
    return [
        list(zip(
            chain(fixed, islice(cyclers, npairs-1)),
            reversed(list(islice(cyclers, npairs)))
        ))
        for _ in range(rounds)
        for _ in [next(cyclers)]
    ]