寻找按偏好排序的组合生成算法。

4

我正在尝试找到一种算法来识别如下有序组合的结果:

  • 有 N 场比赛,每场比赛有三个互斥的结果(胜利、失败或平局),总共有 3N 个可能的结果和3^N种组合。
  • 每个可能的结果都被分配了一个唯一的排名,最喜欢的结果排名为1,最不喜欢的结果排名为3N。
  • 找出每场比赛的前 M 种结果组合,从包含最喜欢排名结果的组合开始。

例如,假设 N = 3,结果按以下方式排名:

Contest 1 Win = 1
Contest 1 Tie = 4
Contest 1 Loss = 7
Contest 2 Win = 2
Contest 2 Tie = 5
Contest 2 Loss = 8
Contest 3 Win = 3
Contest 3 Tie = 6
Contest 3 Loss = 9

考虑到这些排名,组合应按以下顺序排序:
Contest 1 Win  (1), Contest 2 Win  (2), Contest 3 Win  (3)
Contest 1 Win  (1), Contest 2 Win  (2), Contest 3 Tie  (6)
Contest 1 Win  (1), Contest 2 Win  (2), Contest 3 Loss (9)
Contest 1 Win  (1), Contest 2 Tie  (5), Contest 3 Win  (3)
Contest 1 Win  (1), Contest 2 Loss (8), Contest 3 Win  (3)
Contest 1 Win  (1), Contest 2 Tie  (5), Contest 3 Tie  (6)
Contest 1 Win  (1), Contest 2 Tie  (5), Contest 3 Loss (9)
Contest 1 Win  (1), Contest 2 Loss (8), Contest 3 Win  (6)
Contest 1 Win  (1), Contest 2 Loss (8), Contest 3 Loss (9)
Contest 1 Tie  (4), Contest 2 Win  (2), Contest 3 Win  (3)
Contest 1 Loss (7), Contest 2 Win  (2), Contest 3 Win  (3)
Contest 1 Tie  (4), Contest 2 Win  (2), Contest 3 Tie  (6)
Contest 1 Tie  (4), Contest 2 Win  (2), Contest 3 Loss (9)
Contest 1 Loss (7), Contest 2 Win  (2), Contest 3 Tie  (6)
Contest 1 Loss (7), Contest 2 Win  (2), Contest 3 Loss (9)
Contest 1 Tie  (4), Contest 2 Tie  (5), Contest 3 Win  (3)
Contest 1 Tie  (4), Contest 2 Loss (8), Contest 3 Win  (3)
Contest 1 Loss (7), Contest 2 Tie  (5), Contest 3 Win  (3)
Contest 1 Loss (7), Contest 2 Loss (8), Contest 3 Win  (3)
Contest 1 Tie  (4), Contest 2 Tie  (5), Contest 3 Tie  (6)
Contest 1 Tie  (4), Contest 2 Tie  (5), Contest 3 Loss (9)
Contest 1 Tie  (4), Contest 2 Loss (8), Contest 3 Tie  (6)
Contest 1 Tie  (4), Contest 2 Loss (8), Contest 3 Loss (9)
Contest 1 Loss (7), Contest 2 Tie  (5), Contest 3 Tie  (6)
Contest 1 Loss (7), Contest 2 Tie  (5), Contest 3 Loss (9)
Contest 1 Loss (7), Contest 2 Loss (8), Contest 3 Tie  (6)
Contest 1 Loss (7), Contest 2 Loss (8), Contest 3 Loss (9)

我正在寻找一种方法,可以为任意大的N生成这些组合,尽管我并不希望得到所有的组合。例如,当N = 256且总共有3^256个组合时,我希望找到前500个组合。


“7,2,3”不应该在“4,5,3”之前吗?因为第一个数字中最大的是2,而第二个数字中最大的是3,且2 < 3。如果不是这样,我真的不明白你是如何得出输出结果的。 - Bernhard Barker
@Dukeling - 你是正确的。我想现在我弄对了。很明显,我无法像将其放入算法中那样高效地在脑海中解决这个问题 ;) - Tim Dean
@PieterGeerkens - 好的,我可以用三进制表示来模拟,但这怎么帮助我按等级排序组合呢? - Tim Dean
@TimDean:从0数到3的N次方,然后执行与评分矩阵的点积。然后排序。 - Pieter Geerkens
1
@PieterGeerkens(请阅读问题的最后一句话)生成多达3^256个组合将需要永远的时间,无论你如何高效地完成它。 - Bernhard Barker
显示剩余5条评论
4个回答

4
这个算法似乎可行。下面是Python实现代码。
基本上,我会按照给定结果的值对输入进行排序:
Contest 1 Win = 1
Contest 2 Win = 2
Contest 3 Win = 3
Contest 1 Tie = 4
Contest 2 Tie = 5
Contest 3 Tie = 6
Contest 1 Loss = 7
Contest 2 Loss = 8
Contest 3 Loss = 9

我称之为“排序”。然后,我生成一个空结果列表:
[None, None, None]

递归算法非常简单,具体如下:
  1. 如果所有插槽都已经填满,则打印结果并返回。
  2. 否则,按升序迭代未使用的排序。如果排序是未使用过的插槽,则填充插槽、标记排序为已用,并进行递归。否则,继续迭代。
以下是代码。还有一个额外的技巧避免重复,例如我们刚刚填满了排序#6,那么我们只会使用排序#7、8和9。
#rankings as tuple of (winval, tieval, lossval) for each
#contest
rankings = [(1, 4, 7), (2, 5, 8), (3, 6, 9)]

#first sort the rankings by their values into
#list of (contestnum, w/t/l, value)
orderings = []
for i, (w, t, l) in enumerate(rankings):
    orderings.append((i, 'w', w))
    orderings.append((i, 't', t))
    orderings.append((i, 'l', l))
orderings.sort(key=lambda (i,res,val): val)

#now, find solution recursively as follows:
#- if list is full then print result & return
#- else, iterate thru the rankings & recur for each unused slot

def solve(orderings, slots, used_orderings, first_ordering):
    if all(slot is not None for slot in slots):
        yield slots
        return

    i = first_ordering
    while i < len(orderings):
        slot, result, value = orderings[i]

        if used_orderings[i]:
            i += 1
            continue
        if slots[slot] is not None:
            i += 1
            continue

        slots[slot] = (result, value)
        used_orderings[i] = True
        for solution in solve(orderings, slots, used_orderings, i):
            yield solution
        #backtrack
        slots[slot] = None
        used_orderings[i] = False

        i += 1

#print the first 40 solutions
num_solutions = 0
for solution in solve(orderings, [None]*len(rankings), [False]*len(orderings), 0):
    print "Solution #%d: %s" % (num_solutions+1, solution)
    num_solutions += 1
    if num_solutions >= 40:
        break

以下是给定输入所打印的结果,与问题相匹配:
Solution #1: [('w', 1), ('w', 2), ('w', 3)]
Solution #2: [('w', 1), ('w', 2), ('t', 6)]
Solution #3: [('w', 1), ('w', 2), ('l', 9)]
Solution #4: [('w', 1), ('t', 5), ('w', 3)]
Solution #5: [('w', 1), ('l', 8), ('w', 3)]
Solution #6: [('w', 1), ('t', 5), ('t', 6)]
Solution #7: [('w', 1), ('t', 5), ('l', 9)]
Solution #8: [('w', 1), ('l', 8), ('t', 6)]
Solution #9: [('w', 1), ('l', 8), ('l', 9)]
Solution #10: [('t', 4), ('w', 2), ('w', 3)]
Solution #11: [('l', 7), ('w', 2), ('w', 3)]
Solution #12: [('t', 4), ('w', 2), ('t', 6)]
Solution #13: [('t', 4), ('w', 2), ('l', 9)]
Solution #14: [('l', 7), ('w', 2), ('t', 6)]
Solution #15: [('l', 7), ('w', 2), ('l', 9)]
Solution #16: [('t', 4), ('t', 5), ('w', 3)]
Solution #17: [('t', 4), ('l', 8), ('w', 3)]
Solution #18: [('l', 7), ('t', 5), ('w', 3)]
Solution #19: [('l', 7), ('l', 8), ('w', 3)]
Solution #20: [('t', 4), ('t', 5), ('t', 6)]
Solution #21: [('t', 4), ('t', 5), ('l', 9)]
Solution #22: [('t', 4), ('l', 8), ('t', 6)]
Solution #23: [('t', 4), ('l', 8), ('l', 9)]
Solution #24: [('l', 7), ('t', 5), ('t', 6)]
Solution #25: [('l', 7), ('t', 5), ('l', 9)]
Solution #26: [('l', 7), ('l', 8), ('t', 6)]
Solution #27: [('l', 7), ('l', 8), ('l', 9)]

如果我随机生成了256个比赛的排名,似乎程序立即运行。


最终,最佳解决方案!我猜复杂度应该是O(M) - justhalf
@justhalf:嗯,让我们看看...将X视为输入大小,即3N,需要O(X log X)进行排序,然后每个插槽填充需要O(X)(最坏情况下需要遍历所有排序来填充一个插槽),而且您必须填充O(X/3)个插槽,因此每个解决方案的打印成本为O(X^2),因此总成本为O(M * X^2),其中M是您想要的解决方案数量,大致上是这样。 - Claudiu
@justhalf:实际上这可能更好。现在我想了想,对于每个解决方案,您最多只需一次遍历订单列表,因为每当您进入下一步时,您不会搜索已经搜索过的任何条目。因此,这将使其达到O(M * X)-每个解决方案最多搜索整个列表一次。 - Claudiu
感谢@Claudiu提供的出色解决方案-它确实似乎做到了我所要求的,并且效率也很高。我目前正在研究在我的应用程序中使用它的影响,并将其与nailtoon的解决方案进行比较,后者也因不同的原因吸引着我。 - Tim Dean
非常流畅的解决方案和分析。实际上,这个解决方案精确地描述了所要求的顺序枚举。我提出的方法更集中于修剪策略而不是枚举(对于较小的M值来说这是可以的)。使用这两种解决方案可以帮助构建一个能够有效地枚举范围内所有元组的算法,即使M不是很小也可以。 - naitoon

1

首先,让我们重新表述问题以抽象出细节并确保我们在谈论同一个问题。

长度为N的元组有3^N个。每个元组(a_1,a_2,...,a_N)的组成部分a_i是介于1到3N之间的不同整数,并且对于固定的i,a_i只能取值于基数为3的子集S_i中。对于[1,3N]中的每个值,元组中只有一个位置可以承担该值。

现在,用自然数排序将元组S的组成部分排序后得到sorted(S)。如果sorted(S)在字典序中小于sorted(T),则称元组S小于元组T。

问题在于在存在的3^N个元组中按照给定顺序找到前M个元组,其中M << 3^N。

我看到的解决原则本质上是带有修剪的回溯。

为了以关键方式修剪搜索空间,计算不大于M的最高3次幂。假设这个幂是H。我们有3^(H+1) > M >= 3^H。此时,您知道您的M个元组位于一个集合中,其中(N-H-1)个元组成分取最小可能值。可以按以下方式找到并固定这些组件:首先,取可以取值1的组件i,并将其固定为1。然后,在[1,3N]的值中未被组件i占用的值中选择最小值,并将唯一能够取该值的组件j固定为该值。以相同方式继续固定(N-H-1)个组件。之后,您已确定了至多3M个元组的集合。您可以通过对剩余的H+1个组件进行详尽的搜索来生成该完整元组集,并对该集进行排序,从而获得前M个元组。

1
感谢@naitoon - 我真的很喜欢这个解决方案。虽然Claudiu提供的解决方案是一个非常好的通用解决方案,但你的解决方案可能更适合我的特定用途。具体而言,我认为我可以将此解决方案调整为在3^n个可能解的任意范围内查找解决方案。 - Tim Dean
感谢TimDean。也许你可以将@Claudiu的解决方案与这些关于修剪的建议结合起来,以获得一个非常高效的算法,按顺序枚举范围内的所有元组。有关修剪的考虑将确定Claudiu描述的给定范围的枚举的初始和最终状态。 - naitoon

0

您可以在O(N.M)的时间内构建解决方案。这里的解决方案将是一个包含M个排列及其相关分数的列表,按分数降序排列。

如果N为1,则解决方案很简单:根据它们的得分对您获得的3个结果进行排序(如果M<3,则减少解决方案)。也就是说,如果结果得分为胜利、失败和平局,则解决方案为:

  • sorted([('W', win), ('L', lose), ('D', draw)])

(当然是按得分降序排序)。

现在是迭代步骤。给定大小为N-1的问题的大小为M的解决方案,然后考虑具有结果得分(胜利、失败、平局)的新比赛。

只需将胜利、失败、平局得分添加到每个M解决方案中,并重新排序即可。也就是说,假设解决方案为[(res_1, score_1), (res_2, score_2), ... (res_M, score_M)],则:

  • wins = [(res_1 + 'W', score_1+win), (res_2 + 'W', score_2+win), ... (res_M + 'W', score_M+win)]
  • loses = [(res_1 + 'L', score_1+lose), (res_2 + 'L', score_2+lose), ... (res_M + 'L', score_M+lose)]
  • draws = [(res_1 + 'D', score_1+draw), (res_2 + 'D', score_2+draw), ... (res_M + 'D', score_M+draw)]

新的解决方案将是sorted(wins + loses + draws)[:M],这是来自组合解决方案的前M个最大解决方案。请注意,您可以使用归并排序的合并步骤在O(M)时间内完成此排序步骤。

以下是演示算法的粗略代码。对于紧凑的解决方案,列表切片和字符串附加需要被替换为O(1)的内容,并且需要使用合并步骤而不是一般排序。

def results(outcomes, M):
    if len(outcomes) == 1:
        return sorted(zip(outcomes[0], 'WLD'), reverse=True)[:M]
    tr = results(outcomes[1:], M)
    result = []
    for outscore, outcome in zip(outcomes[0], 'WLD'):
        result.extend((sc + outscore, pr + outcome) for sc, pr in tr)
    return sorted(result, reverse=True)[:M]

# Outcome scores for each contest, in order (win, lose, draw).
testcase = [(1, 7, 4), (2, 8, 5), (3, 9, 6)]
print results(testcase, 5)

输出结果为:

[(24, 'LLL'), (21, 'LLD'), (21, 'LDL'), (21, 'DLL'), (18, 'WLL')]

谢谢您的建议,但这似乎并不符合我的要求。这个方法似乎是按照组合中所有结果的排名总和排序的组合列表。正如我在N=3时展示的组合顺序所示,组合的顺序并不是基于总和的排序。 - Tim Dean

0

GCC 4.7.3:g++ -Wall -Wextra -std=c++0x perm.cpp

#include <algorithm>
#include <cassert>
#include <functional>
#include <iostream>
#include <iterator>
#include <string>

using ints = std::vector<int>;

template <typename T>
bool equal(const std::vector<T>& a, const std::vector<T>& b) {
  return std::equal(std::begin(a), std::end(a), std::begin(b)); }

bool is_strictly_monotonic(const ints& p) {
  return std::is_sorted(std::begin(p), std::end(p), std::less_equal<int>()); }

ints gen_seq(int size, int init) {
  ints v(size);
  for (auto& e : v) { e = ++init; }
  return v; }

ints begin_combination(const ints& p, int mod) {
  return gen_seq(p.size(), -1); }

// Not the same as a normal STL end. This is actually the last combination.
ints end_combination(const ints& p, int mod) {
  return gen_seq(p.size(), mod - p.size()); }

// This is the most complicated bit of code, but what it does is
// straightforward. This function treats the input sequence as the
// (radix m+1) digits in a counter. It increments the counter by one, while
// maintaining the constraint that the digits in the sequence be strictly
// monotonic. This means that some numbers in the regular counter sequence
// will be skipped, for example the sequence after {1, 2, 9} (radix 10)
// is {1, 3, 4}. Like all digital counters it wraps on overflow.
void inc_monotonic_seq(ints& p, const int m) {
  assert(is_strictly_monotonic(p));

  int i = p.size() - 1;
  // scan right to left for number to increment
  while (i != -1) {
    if (p[i] < m) {
      ++p[i];
      ints::size_type j = i + 1;
      // propogate carry left to right
      while (j != p.size()) {
        p[j] = p[j - 1] + 1;
        if (m < p[j]) { break; }
        ++j; }
      if (j == p.size()) { break; } }
    --i; }

  // wrap around
  if (i == -1) { p = begin_combination(p, m); }

  assert(is_strictly_monotonic(p));
}

// A combination is valid if each contest is represented once.
bool is_valid_combination(const ints& p, const ints& contests) {
  auto t(p);
  for (auto& e : t) { e = contests[e]; }
  std::sort(std::begin(t), std::end(t));
  return is_strictly_monotonic(t); }

// This is the second most complicated bit of code. It calculates the
// combination following p in the ordered combination sequence. Combinations
// are ordered lexically by the sort of their elements, for example:
// {6, 1, 2} < {3, 1, 5} because {1, 2, 6} < {1, 3, 5}. Further, it enforces
// the constraint that each digit in the combination must be drawn from the
// contest in that position.
bool next_combination(ints& p, const ints& contests) {
  std::sort(std::begin(p), std::end(p));
  const auto mx = end_combination(p, contests.size() - 1);

  do {
    if (equal(p, mx)) { return false; }
    inc_monotonic_seq(p, contests.size() - 1);
  } while (!is_valid_combination(p, contests));

  // Sort in contest order: contest0, contest1, ...
  for (int i = 0; i < p.size(); ++i) {
    while (i != contests[p[i]]) {
      std::swap(p[i], p[contests[p[i]]]); } }

  return true;
}

int main() {
  const int N = 256; // number of contests
  const int M = 500; // number of most preferably ranked outcomes to display

  // This is the third most complicated bit of code. The following vector is
  // a map from priorities to contests. For example, the following 3 contest
  // win-tie-loss priorities {{0, 3, 6}, {1, 4, 7}, {2, 4, 8}} are stored as
  // {0, 1, 2, 0, 1, 2, 0, 1, 2}. This inversion is possible because the
  // contest outcome priorities are required to be disjoint since there is
  // a total ordering over the outcomes. Note, the highest priority is 0
  // (not 1 as in the specification).
  ints contests(3 * N); // map priorities to contests
  int c  = 0;
  for (auto& e : contests) { e = c % N; ++c; }

  // Highest priority combination.
  ints p(N);
  p = begin_combination(p, contests.size() - 1);

  int total = 1;
  do {
    // Finally, doing some sort of mapping here from priorities to strings,
    // as in the problem specification, is trivially done.
    std::copy(std::begin(p), std::end(p), std::ostream_iterator<int>(std::cout, " "));
    std::cout << "\n";
    if (M < ++total) { break; }
  } while(next_combination(p, contests));

  return 0;
}

一些注释:
  1. 如果按照现有写法,使用 N=256 和 M=500 运行代码会发现速度太慢了。我相信有聪明的方法可以避免所有排序操作。
  2. 如果你想要 M=500,那么 N 并不需要很大,N=6 就足够了(3^6=729)。如果 N 很大,你会看到一个长的恒定前缀/头部,后面跟着一个短的变化尾部。如果你思考一下,就不难理解其中的原因。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接