将整数列表分成K个子列表,使每个子列表的和相等

6

类似的问题有12,但是答案并没有帮助。 假设我们有一个整数列表。我们想要找到K个不相交的子列表,使它们完全覆盖给定的列表,并且它们都具有相同的总和。例如,如果A = [4, 3, 5, 6, 4, 3, 1],并且K = 2,则答案应该是:

[[3, 4, 6], [1, 3, 4, 5]]
or
[[4, 4, 5], [1, 3, 3, 6]]

我写了一段只在K = 2时起作用的代码,对于小列表的输入它可以正常运行,但是因为代码高复杂度,当输入列表非常大时,操作系统会终止任务。我的代码如下:

def subarrays_equal_sum(l):
    from itertools import combinations

    if len(l) < 2 or sum(l) % 2 != 0:
        return []
    l = sorted(l)
    list_sum = sum(l)
    all_combinations = []
    for i in range(1, len(l)):
        all_combinations += (list(combinations(l, i)))

    combinations_list = [i for i in all_combinations if sum(i) == list_sum / 2]
    if not combinations_list:
        return []
    final_result = []
    for i in range(len(combinations_list)):
        for j in range(i + 1, len(combinations_list)):
            first = combinations_list[i]
            second = combinations_list[j]
            concat = sorted(first + second)
            if concat == l and [list(first), list(second)] not in final_result:
                final_result.append([list(first), list(second)])

    return final_result

任何值的答案可以在这里找到here。但是如果我们传递参数A = [4, 3, 5, 6, 4, 3, 1]K = 2,他们的代码只返回了[[5, 4, 3, 1],[4, 3, 6]],而我的代码返回了所有可能的列表,即

[[[3, 4, 6], [1, 3, 4, 5]], [[4, 4, 5], [1, 3, 3, 6]]]

我的问题是:

  1. 如何改进我的代码的复杂度和成本?
  2. 如何使我的代码适用于任何值的k

1
如果我误解了问题,请原谅 - 但在我看来,该问题并没有保证“k”个子列表是否存在,也没有保证存在的符合条件的子列表数量不超过k。例如,如果整数输入列表为[1,2],则没有有效的答案...我是否遗漏了任何问题的条件/保证? - Vin
4
请注意,即使找到单个k分区也是k-way number partitioning的一个实例,这是一个NP难问题,因此没有真正有效的算法被发现(而且可能永远不会)。 - collapsar
1
“subarray” 不是这里的正确术语。 - n. m.
2
@Vin 您说得对。这就是为什么我在代码中有两个条件,即 if len(l) < 2 or sum(l) % 2 != 0: 。在您的示例 [1,2] 中,列表的总和(3)不可被2整除,因此我的代码返回[] - Bsh
1
@KellyBundy 这里的大输入可以是长度超过100的列表。 - Bsh
显示剩余13条评论
2个回答

4

这里有一个处理重复项的解决方案。

首先,如前所述,找到任何解决方案的问题是NP完全的。因此,在某些情况下,这将花费很长时间才能意识到没有解决方案。我应用了合理的启发式方法来限制这种情况发生的频率。这些启发式方法可以改进。但请注意,有时根本没有解决方案。

这个解决方案的第一步是将数字列表转换为[(value1, repeat), (value2, repeat), ...]。其中一个启发式要求首先按绝对值降序排序,然后按值递减排序。这是因为我尝试首先使用第一个元素,我们期望一堆小的剩余数字仍然能给我们带来总和。

接下来,我将尝试将其拆分为可能的最大子集,具有正确的目标总和和所有剩余元素。

然后,我将把剩余部分拆分为可能的最大剩余子集不大于第一个,以及在此之后产生的那些元素。

递归执行此操作,我们就会找到一个解决方案。然后我们将其返回到上层。

但是,这里有一个棘手的地方,我不会通过查看组合来进行拆分。相反,我将使用动态规划,就像我们通常用于子集和伪多项式算法一样,只是我将用它来构建一个数据结构,从中我们可以进行拆分。这个数据结构将包含以下字段:
  1. value 是该元素的值。
  2. repeat 是在子集和中使用它的次数。
  3. skip 是我们拥有但没有在子集和中使用的副本数量。
  4. tail 是这些解决方案的尾部。
  5. prev 是其他一些解决方案,其中我们做了其他事情。
下面是一个构建此数据结构的类,其中包含将元素拆分为子集和仍可用于进一步拆分的方法。
from collections import namedtuple

class RecursiveSums (
      namedtuple('BaseRecursiveSums',
                 ['value', 'repeat', 'skip', 'tail', 'prev'])):

    def sum_and_rest(self):
        if self.tail is None:
            if self.skip:
                yield ([self.value] * self.repeat, [(self.value, self.skip)])
            else:
                yield ([self.value] * self.repeat, [])
        else:
            for partial_sum, rest in self.tail.sum_and_rest():
                for _ in range(self.repeat):
                    partial_sum.append(self.value)
                if self.skip:
                    rest.append((self.value, self.skip))
                yield (partial_sum, rest)
        if self.prev is not None:
            yield from self.prev.sum_and_rest()

你可能需要多看几次才能理解它的工作原理。

接下来,记住我说过我使用一种启发式方法尝试在使用小元素之前使用大元素。这是我们需要进行比较的一些代码。

class AbsComparator(int):
    def __lt__ (self, other):
        if abs(int(self)) < abs(int(other)):
            return True
        elif abs(other) < abs(self):
            return False
        else:
            return int(self) < int(other)

def abs_lt (x, y):
    return AbsComparator(x) < AbsComparator(y)

我们需要两种形式。一种是用于直接比较的函数,另一种是用于Python的sort函数中的key参数的类。有关后者的更多信息,请参见使用比较器函数进行排序
现在是方法的核心。这会找到所有分成子集(在我们使用的比较度量中不大于bound)和更多拆分的剩余元素的方法。
这个想法与动态规划方法解决子集和问题的方法相同https://www.geeksforgeeks.org/count-of-subsets-with-sum-equal-to-x/,但有两个主要区别。第一个区别是,我们正在构建数据结构而不是计算答案。第二个区别是,我们的键是(partial_sum, bound_index),因此我们知道我们的bound当前是否满足,如果不满足,我们知道下一个要比较的元素是什么来测试它。
def lexically_maximal_subset_rest (elements, target, bound=None):
    """
        elements = [(value, count), (value, count), ...]
            with largest absolute values first.
        target = target sum
        bound = a lexical bound on the maximal subset.
    """
    # First let's deal with all of the trivial cases.
    if 0 == len(elements):
        if 0 == target:
            yield []
    elif bound is None or 0 == len(bound):
        # Set the bound to something that trivially works.
        yield from lexically_maximal_subset_rest(elements, target, [abs(elements[0][0]) + 1])
    elif abs_lt(bound[0], elements[0][0]):
        pass # we automatically use more than the bound.
    else:
        # The trivial checks are done.

        bound_satisfied = (bound[0] != elements[0][0])

        # recurse_by_sum will have a key of (partial_sum, bound_index).
        # If the bound_index is None, the bound is satisfied.
        # Otherwise it will be the last used index in the bound.
        recurse_by_sum = {}
        # Populate it with all of the ways to use the first element at least once.
        (init_value, init_count) = elements[0]
        for i in range(init_count):
            if not bound_satisfied:
                if len(bound) <= i or abs_lt(bound[i], init_value):
                    # Bound exceeded.
                    break
                elif abs_lt(init_value, bound[i]):
                    bound_satisfied = True
            if bound_satisfied:
                key = (init_value * (i+1), None)
            else:
                key = (init_value * (i+1), i)

            recurse_by_sum[key] = RecursiveSums(
                init_value, i+1, init_count-i-1, None, recurse_by_sum.get(key))

        # And now we do the dynamic programming thing.
        for j in range(1, len(elements)):
            value, repeat = elements[j]
            next_recurse_by_sum = {}
            for key, tail in recurse_by_sum.items():
                partial_sum, bound_index = key
                # Record not using this value at all.
                next_recurse_by_sum[key] = RecursiveSums(
                    value, 0, repeat, tail, next_recurse_by_sum.get(key))
                # Now record the rest.
                for i in range(1, repeat+1):
                    if bound_index is not None:
                        # Bounds check.
                        if len(bound) <= bound_index + i:
                            break # bound exceeded.
                        elif abs_lt(bound[bound_index + i], value):
                            break # bound exceeded.
                        elif abs_lt(value, bound[bound_index + i]):
                            bound_index = None # bound satisfied!
                    if bound_index is None:
                        next_key = (partial_sum + value * i, None)
                    else:
                        next_key = (partial_sum + value * i, bound_index + i)

                    next_recurse_by_sum[next_key] = RecursiveSums(
                        value, i, repeat - i, tail, next_recurse_by_sum.get(next_key))
            recurse_by_sum = next_recurse_by_sum

        # We now have all of the answers in recurse_by_sum, but in several keys.
        # Find all that may have answers.
        bound_index = len(bound)
        while 0 < bound_index:
            bound_index -= 1
            if (target, bound_index) in recurse_by_sum:
                yield from recurse_by_sum[(target, bound_index)].sum_and_rest()
        if (target, None) in recurse_by_sum:
            yield from recurse_by_sum[(target, None)].sum_and_rest()

现在我们来实现剩下的部分。

def elements_split (elements, target, k, bound=None):
    if 0 == len(elements):
        if k == 0:
            yield []
    elif k == 0:
        pass # still have elements left over.
    else:
        for (subset, rest) in lexically_maximal_subset_rest(elements, target, bound):
            for answer in elements_split(rest, target, k-1, subset):
                answer.append(subset)
                yield answer

def subset_split (raw_elements, k):
    total = sum(raw_elements)
    if 0 == (total % k):
        target = total // k
        counts = {}
        for e in sorted(raw_elements, key=AbsComparator, reverse=True):
            counts[e] = 1 + counts.get(e, 0)
        elements = list(counts.items())
        yield from elements_split(elements, target, k)

这里是使用您的列表进行演示,它被加倍了。我们将其分成4个相等的部分。在我的笔记本电脑上,它在0.084秒内找到了所有10个解决方案。

n = 0
for s in subset_split([4, 3, 5, 6, 4, 3, 1]*2, 4):
    n += 1
    print(n, s)

所以...没有性能保证。但这通常应该能够快速找到每个分割点。当然,通常还有指数级别的分割点。例如,如果您将列表复制16次并尝试将其分成32组,则在我的笔记本电脑上需要大约8分钟才能找到所有224082个解决方案。

如果我不尝试处理负数,这可以加快速度。(使用更便宜的比较,删除已超过target的所有部分和以避免计算大部分动态规划表。)

这是加速版本。对于仅包含非负数的情况,它的速度大约快两倍。如果存在负数,则会产生错误结果。

from collections import namedtuple

class RecursiveSums (
      namedtuple('BaseRecursiveSums',
                 ['value', 'repeat', 'skip', 'tail', 'prev'])):

    def sum_and_rest(self):
        if self.tail is None:
            if self.skip:
                yield ([self.value] * self.repeat, [(self.value, self.skip)])
            else:
                yield ([self.value] * self.repeat, [])
        else:
            for partial_sum, rest in self.tail.sum_and_rest():
                for _ in range(self.repeat):
                    partial_sum.append(self.value)
                if self.skip:
                    rest.append((self.value, self.skip))
                yield (partial_sum, rest)
        if self.prev is not None:
            yield from self.prev.sum_and_rest()

def lexically_maximal_subset_rest (elements, target, bound=None):
    """
        elements = [(value, count), (value, count), ...]
            with largest absolute values first.
        target = target sum
        bound = a lexical bound on the maximal subset.
    """
    # First let's deal with all of the trivial cases.
    if 0 == len(elements):
        if 0 == target:
            yield []
    elif bound is None or 0 == len(bound):
        # Set the bound to something that trivially works.
        yield from lexically_maximal_subset_rest(elements, target, [abs(elements[0][0]) + 1])
    elif bound[0] < elements[0][0]:
        pass # we automatically use more than the bound.
    else:
        # The trivial checks are done.

        bound_satisfied = (bound[0] != elements[0][0])

        # recurse_by_sum will have a key of (partial_sum, bound_index).
        # If the bound_index is None, the bound is satisfied.
        # Otherwise it will be the last used index in the bound.
        recurse_by_sum = {}
        # Populate it with all of the ways to use the first element at least once.
        (init_value, init_count) = elements[0]
        for i in range(init_count):
            if not bound_satisfied:
                if len(bound) <= i or bound[i] < init_value:
                    # Bound exceeded.
                    break
                elif init_value < bound[i]:
                    bound_satisfied = True
            if bound_satisfied:
                key = (init_value * (i+1), None)
            else:
                key = (init_value * (i+1), i)

            recurse_by_sum[key] = RecursiveSums(
                init_value, i+1, init_count-i-1, None, recurse_by_sum.get(key))

        # And now we do the dynamic programming thing.
        for j in range(1, len(elements)):
            value, repeat = elements[j]
            next_recurse_by_sum = {}
            for key, tail in recurse_by_sum.items():
                partial_sum, bound_index = key
                # Record not using this value at all.
                next_recurse_by_sum[key] = RecursiveSums(
                    value, 0, repeat, tail, next_recurse_by_sum.get(key))
                # Now record the rest.
                for i in range(1, repeat+1):
                    if target < partial_sum + value * i:
                        break # these are too big.

                    if bound_index is not None:
                        # Bounds check.
                        if len(bound) <= bound_index + i:
                            break # bound exceeded.
                        elif bound[bound_index + i] < value:
                            break # bound exceeded.
                        elif value < bound[bound_index + i]:
                            bound_index = None # bound satisfied!
                    if bound_index is None:
                        next_key = (partial_sum + value * i, None)
                    else:
                        next_key = (partial_sum + value * i, bound_index + i)

                    next_recurse_by_sum[next_key] = RecursiveSums(
                        value, i, repeat - i, tail, next_recurse_by_sum.get(next_key))
            recurse_by_sum = next_recurse_by_sum

        # We now have all of the answers in recurse_by_sum, but in several keys.
        # Find all that may have answers.
        bound_index = len(bound)
        while 0 < bound_index:
            bound_index -= 1
            if (target, bound_index) in recurse_by_sum:
                yield from recurse_by_sum[(target, bound_index)].sum_and_rest()
        if (target, None) in recurse_by_sum:
            yield from recurse_by_sum[(target, None)].sum_and_rest()

def elements_split (elements, target, k, bound=None):
    if 0 == len(elements):
        if k == 0:
            yield []
    elif k == 0:
        pass # still have elements left over.
    else:
        for (subset, rest) in lexically_maximal_subset_rest(elements, target, bound):
            for answer in elements_split(rest, target, k-1, subset):
                answer.append(subset)
                yield answer

def subset_split (raw_elements, k):
    total = sum(raw_elements)
    if 0 == (total % k):
        target = total // k
        counts = {}
        for e in sorted(raw_elements, key=AbsComparator, reverse=True):
            counts[e] = 1 + counts.get(e, 0)
        elements = list(counts.items())
        yield from elements_split(elements, target, k)

n = 0
for s in subset_split([4, 3, 5, 6, 4, 3, 1]*16, 32):
    n += 1
    print(n, s)

我刚刚运行了 subset_split([1, 0, 3, 1, -5], 2)。你的代码适用于所有整数,返回的结果如下: 1 [[-5, 3, 1, 1, 0]] 和 2 [[0], [-5, 3, 1, 1]] 第二个输出是正确的,但第一个输出不应该出现,因为它违反了规定,即每个输出应包含两个列表。 - Bsh
@Bsh 真烦人。只有当 target == 0 时才会发生这种情况。因为只有在这种情况下,您才能将集合分成错误数量的大小为 target 的元素,并使其正确相加。不过,只需对 subset_splitelements_split 进行小修补即可解决此问题。现在已经修复了。 - btilly
非常好,现在它运行正确。确实是一个详细的答案。我非常感谢您的慷慨贡献。谢谢! - Bsh
为了避免冗余执行,在def subset_split(raw_elements, k)内的第一行应该是if len(raw_elements) &lt; k: return [] - Bsh
为了避免重复执行,在def subset_split(raw_elements, k)的第一行应该加上if len(raw_elements) < k: return [] - Bsh
为了避免重复执行,在def subset_split(raw_elements, k)内的第一行应该加上if len(raw_elements) < k: return [] - undefined

-1

这个问题有很多潜在的解决方案,所以减少需要评估的可行模式数量将是提高性能的关键。

这里有一个想法:分两步走:

  1. 生成一组索引组,这些组的总和等于目标和。
  2. 合并不交的索引组(因此索引仅属于一个组),以便获得K个组。

assemble函数是一个递归生成器,将生成不重叠的n个索引组合集。由于每个组的总和为total/K,因此列表将完全覆盖原始列表元素。

def assemble(combos,n):
    if not n:
        yield []
        return
    if len(combos)<n: return
    for i,idx in enumerate(combos):
        others = [c for c in combos if c.isdisjoint(idx)]
        for rest in assemble(others,n-1):
            yield [idx] + rest
            
def equalSplit(A,K):
    total = sum(A) 
    if total%K: return       # no equal sum solution
    partSum = total//K       # sum of each resulting sub-lists
    combos = [ (0,[]) ]      # groups of indices that form sums <= partSum
    for i,n in enumerate(A): # build the list of sum,patterns 
        combos += [ (tot+n,idx+[i]) for tot,idx in combos
                    if tot+n <= partSum]
    # only keep index sets that add up to the target sum
    combos = [set(idx) for tot,idx in combos if tot == partSum]
    # ouput assembled lists of K sets that don't overlap (full coverage)
    seen = set()
    for parts in assemble(combos,K):
        sol = tuple(sorted(tuple(sorted(A[i] for i in idx)) for idx in parts))
        if sol in seen: continue # skip duplicate solutions
        yield list(sol)
        seen.add(sol)

输出:

A = [4, 3, 5, 6, 4, 3, 1]
print(*equalSplit(A,2), sep='\n')
# [(1, 3, 4, 5), (3, 4, 6)]
# [(1, 3, 3, 6), (4, 4, 5)]

A = [21,22,27,14,15,16,17,18,19,10,11,12,13]
print(*equalSplit(A,5), sep='\n')
# [(10, 15, 18), (11, 13, 19), (12, 14, 17), (16, 27), (21, 22)]
# [(10, 14, 19), (11, 15, 17), (12, 13, 18), (16, 27), (21, 22)]

对于被分成几个部分的大型列表,这仍然需要很长时间,但它应该比组合的蛮力更好一些。


改善指数时间算法的渐近常数是一场徒劳的战斗。 - user1196549
1
从无限大小问题的角度来看,确实有很多争论。对于那些范围有限且足够小的问题领域来说,这可能仍然是值得的。在实践中需要记住的一点是避免把宝贵的东西与纯粹主义的观念一起抛弃。 - Alain T.

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接