将一个数字列表分成n个块,使得这些块的和(近似)相等,并保持原始顺序。

24

这不是标准的分区问题,因为我需要保持列表中元素的顺序。

例如,如果我有一个列表

[1, 6, 2, 3, 4, 1, 7, 6, 4]

如果我想要两个块,那么分割应该是这样的

[[1, 6, 2, 3, 4, 1], [7, 6, 4]] 

每边的总和为17。对于三个块,结果将是

[[1, 6, 2, 3], [4, 1, 7], [6, 4]]

对于和为12、12和10的总和。

编辑以获取额外的解释

我目前将总和除以块数,然后使用该值作为目标,迭代直到接近该目标。问题在于某些数据集可能会导致算法出错,例如尝试将以下内容分成3个部分:

[95, 15, 75, 25, 85, 5]

总数为300,目标是100。第一块将加总至95,第二块将加总至90,第三块将加总至110,剩余5个。将其附加在应该放置的位置会得到95、90、115,而一个更“合理”的解决方案将是110、100、90。

end edit

背景:

我有一个包含不同高度文本(歌词)的列表,我想将文本分成任意数量的列。目前我基于所有行的总高度计算目标高度,但显然这是一个持续低估的估计,有时会导致次优解(最后一列明显更高)。


4
"高度"是什么? - erip
此外,您是要为两个子列表还是任意子列表进行此操作? - erip
1
你觉得这个问题可以重新表述为“将一个列表分成n个子列表,使得值的总和相差最小”吗?你需要子列表还是索引? - Pynchia
我认为这是一个非常有趣的问题,我可能有一种贪心的方法,可以在任何给定块数的情况下以O(n)的速度运行。明天我会回报。 - timgeb
8个回答

14
这种方法定义了将数组分成大致相等元素数量的分区边界,然后反复搜索更好的分区,直到无法找到更多为止。与其他发布的解决方案不同的是,它通过尝试多个不同的分区来寻找最优解。其他解决方案试图在数组的单次遍历中创建一个良好的分区,但我无法想出一个保证最优的单次算法。
此处代码是该算法的高效实现,但很难理解,因此在结尾附加了一个更易读的版本。
def partition_list(a, k):
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    partition_between = [(i+1)*len(a)/k for i in range(k-1)]
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0

    while True:
        starts = [0]+partition_between
        ends = partition_between+[len(a)]
        partitions = [a[starts[i]:ends[i]] for i in range(k)]
        heights = map(sum, partitions)

        abs_height_diffs = map(lambda x: abs(average_height - x), heights)
        worst_partition_index = abs_height_diffs.index(max(abs_height_diffs))
        worst_height_diff = average_height - heights[worst_partition_index]

        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1

        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1

        move = -1 if worst_height_diff < 0 else 1
        bound_to_move = 0 if worst_partition_index == 0\
                        else k-2 if worst_partition_index == k-1\
                        else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\
                        else worst_partition_index
        direction = -1 if bound_to_move < worst_partition_index else 1
        partition_between[bound_to_move] += move * direction

def print_best_partition(a, k):
    print 'Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print 'The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

根据您正在处理的具体情况,可能需要进行一些修改。例如,要确定是否已找到最佳分区,此算法会在各个分区之间没有高度差异时停止,无法比之前连续5次看到的最佳结果更好时也会停止,或者在总共进行100次迭代后作为一个终止点来捕捉所有情况。您可能需要调整这些常数或使用不同的方案。如果您的高度形成了一个复杂的值景,知道何时停止可能会涉及经典的问题,如尝试逃离局部极大值等。

输出

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]]
With heights [34]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
With heights [17, 17]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions
The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
With heights [12, 12, 10]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions
The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]]
With heights [7, 9, 8, 10]

Partitioning [1, 10, 10, 1] into 2 partitions
The best partitioning is [[1, 10], [10, 1]]
With heights [11, 11]

Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions
The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]]
With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102]

Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions
The best partitioning is [[95, 15], [75, 25], [85, 5]]
With heights [110, 100, 90]

编辑

新增了测试用例 [95, 15, 75, 25, 85, 5],该方法能正确处理。

附录

该算法版本更易于阅读和理解,但由于较少利用 Python 内置功能而略微冗长。然而,它似乎执行时间相当甚至略快。

#partition list a into k partitions
def partition_list(a, k):
    #check degenerate conditions
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    #create a list of indexes to partition between, using the index on the
    #left of the partition to indicate where to partition
    #to start, roughly partition the array into equal groups of len(a)/k (note
    #that the last group may be a different size) 
    partition_between = []
    for i in range(k-1):
        partition_between.append((i+1)*len(a)/k)
    #the ideal size for all partitions is the total height of the list divided
    #by the number of paritions
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0
    no_improvements_count = 0
    #loop over possible partitionings
    while True:
        #partition the list
        partitions = []
        index = 0
        for div in partition_between:
            #create partitions based on partition_between
            partitions.append(a[index:div])
            index = div
        #append the last partition, which runs from the last partition divider
        #to the end of the list
        partitions.append(a[index:])
        #evaluate the partitioning
        worst_height_diff = 0
        worst_partition_index = -1
        for p in partitions:
            #compare the partition height to the ideal partition height
            height_diff = average_height - sum(p)
            #if it's the worst partition we've seen, update the variables that
            #track that
            if abs(height_diff) > abs(worst_height_diff):
                worst_height_diff = height_diff
                worst_partition_index = partitions.index(p)
        #if the worst partition from this run is still better than anything
        #we saw in previous iterations, update our best-ever variables
        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1
        #decide if we're done: if all our partition heights are ideal, or if
        #we haven't seen improvement in >5 iterations, or we've tried 100
        #different partitionings
        #the criteria to exit are important for getting a good result with
        #complex data, and changing them is a good way to experiment with getting
        #improved results
        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1
        #adjust the partitioning of the worst partition to move it closer to the
        #ideal size. the overall goal is to take the worst partition and adjust
        #its size to try and make its height closer to the ideal. generally, if
        #the worst partition is too big, we want to shrink the worst partition
        #by moving one of its ends into the smaller of the two neighboring
        #partitions. if the worst partition is too small, we want to grow the
        #partition by expanding the partition towards the larger of the two
        #neighboring partitions
        if worst_partition_index == 0:   #the worst partition is the first one
            if worst_height_diff < 0: partition_between[0] -= 1   #partition too big, so make it smaller
            else: partition_between[0] += 1   #partition too small, so make it bigger
        elif worst_partition_index == len(partitions)-1: #the worst partition is the last one
            if worst_height_diff < 0: partition_between[-1] += 1   #partition too small, so make it bigger
            else: partition_between[-1] -= 1   #partition too big, so make it smaller
        else:   #the worst partition is in the middle somewhere
            left_bound = worst_partition_index - 1   #the divider before the partition
            right_bound = worst_partition_index   #the divider after the partition
            if worst_height_diff < 0:   #partition too big, so make it smaller
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]):   #the partition on the left is bigger than the one on the right, so make the one on the right bigger
                    partition_between[right_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the left bigger
                    partition_between[left_bound] += 1
            else:   #partition too small, make it bigger
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller
                    partition_between[left_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the right smaller
                    partition_between[right_bound] += 1

def print_best_partition(a, k):
    #simple function to partition a list and print info
    print '    Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print '    The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

#tests
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)
print_best_partition(a, 5)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

谢谢@Shawn Sullivan,你对于单次遍历可能不可能的评论与我查看其他人的解决方案时的想法一致。我已经尝试过相关的单次遍历方法,但似乎总是不太理想。我需要先消化一下你的解决方案... - Ng Oon-Ee
很酷,如果你对它的工作方式有任何疑问,请告诉我。我还通过将一些for循环转换为其他表达式制作了算法的较短版本,并通过处理最后的条件真值表使分区调整能够用一行表达。我也发布了那个版本,以防你感兴趣,尽管代码有点难读。 - Shawn Sullivan
@NgOon-Ee 我对Edit 2进行了一些改进,这些改进提高了代码的质量,但在我看来,它仍然比原始版本难以理解。然而,只要这种方法的工作原理清楚,我认为Edit 2代码是我的当前答案。我将原始版本基本保持不变,以防它更容易理解,但如果这个答案被认为是最好的,我会把第二个实现作为主要答案。 - Shawn Sullivan
虽然我认为我可能会使用timgeb的另一个答案,但由于问题的不可预测性,这显然是“正确”的答案。我还认为第二个实现应该成为主要答案,第一个实现作为附录以便更容易理解(即使我花了相当长的时间来查看它)。 - Ng Oon-Ee
1
@ShawnSullivan:非常感谢!适用于Python 3的代码:https://gist.github.com/laowantong/ee675108eee64640e5f94f00d8edbcb4 - Aristide
显示剩余3条评论

5
这是我目前得到的最好的O(n)贪心算法。 其思想是将列表中的项目贪婪地附加到一个块中,直到当前块的总和超过该点上一个块的平均预期总和。平均预期总和不断更新。这个解决方案并不完美,但正如我所说,它是O(n),在我的测试中效果不错。我渴望听取反馈和改进建议。
我在代码中留下了调试打印语句,以提供一些文档。随意注释它们以查看每个步骤的情况。 代码
def split_list(lst, chunks):
    #print(lst)
    #print()
    chunks_yielded = 0
    total_sum = sum(lst)
    avg_sum = total_sum/float(chunks)
    chunk = []
    chunksum = 0
    sum_of_seen = 0

    for i, item in enumerate(lst):
        #print('start of loop! chunk: {}, index: {}, item: {}, chunksum: {}'.format(chunk, i, item, chunksum))
        if chunks - chunks_yielded == 1:
            #print('must yield the rest of the list! chunks_yielded: {}'.format(chunks_yielded))
            yield chunk + lst[i:]
            raise StopIteration

        to_yield = chunks - chunks_yielded
        chunks_left = len(lst) - i
        if to_yield > chunks_left:
            #print('must yield remaining list in single item chunks! to_yield: {}, chunks_left: {}'.format(to_yield, chunks_left))
            if chunk:
                yield chunk
            yield from ([x] for x in lst[i:])
            raise StopIteration

        sum_of_seen += item
        if chunksum < avg_sum:
            #print('appending {} to chunk {}'.format(item, chunk))
            chunk.append(item)
            chunksum += item
        else:
            #print('yielding chunk {}'.format(chunk))
            yield chunk
            # update average expected sum, because the last yielded chunk was probably not perfect:
            avg_sum = (total_sum - sum_of_seen)/(to_yield - 1)
            chunks_yielded += 1
            chunksum = item
            chunk = [item]

测试代码

import random
lst = [1, 6, 2, 3, 4, 1, 7, 6, 4]
#lst = [random.choice(range(1,101)) for _ in range(100)]
chunks = 3
print('list: {}, avg sum: {}, chunks: {}\n'.format(lst, sum(lst)/float(chunks), chunks))
for chunk in split_list(lst, chunks):
    print('chunk: {}, sum: {}'.format(chunk, sum(chunk)))

使用您的列表进行测试:

list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 17.0, chunks: 2

chunk: [1, 6, 2, 3, 4, 1], sum: 17
chunk: [7, 6, 4], sum: 17

---

list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 11.33, chunks: 3

chunk: [1, 6, 2, 3], sum: 12
chunk: [4, 1, 7], sum: 12
chunk: [6, 4], sum: 10

---

list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 8.5, chunks: 4

chunk: [1, 6, 2], sum: 9
chunk: [3, 4, 1], sum: 8
chunk: [7], sum: 7
chunk: [6, 4], sum: 10

---

list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 6.8, chunks: 5

chunk: [1, 6], sum: 7
chunk: [2, 3, 4], sum: 9
chunk: [1, 7], sum: 8
chunk: [6], sum: 6
chunk: [4], sum: 4

使用长度为100、元素从1到100的随机列表进行测试(省略随机列表的打印):

avg sum: 2776.0, chunks: 2

chunk: [25, 8, 71, 39, 5, 69, 29, 64, 31, 2, 90, 73, 72, 58, 52, 19, 64, 34, 16, 8, 16, 89, 70, 67, 63, 36, 9, 87, 38, 33, 22, 73, 66, 93, 46, 48, 65, 55, 81, 92, 69, 94, 43, 68, 98, 70, 28, 99, 92, 69, 24, 74], sum: 2806
chunk: [55, 55, 64, 93, 97, 53, 85, 100, 66, 61, 5, 98, 43, 74, 99, 56, 96, 74, 63, 6, 89, 82, 8, 25, 36, 68, 89, 84, 10, 46, 95, 41, 54, 39, 21, 24, 8, 82, 72, 51, 31, 48, 33, 77, 17, 69, 50, 54], sum: 2746

---

avg sum: 1047.6, chunks: 5

chunk: [19, 76, 96, 78, 12, 33, 94, 10, 38, 87, 44, 76, 28, 18, 26, 29, 44, 98, 44, 32, 80], sum: 1062
chunk: [48, 70, 42, 85, 87, 55, 44, 11, 50, 48, 47, 50, 1, 17, 93, 78, 25, 10, 89, 57, 85], sum: 1092
chunk: [30, 83, 99, 62, 48, 66, 65, 98, 94, 54, 14, 97, 58, 53, 3, 98], sum: 1022
chunk: [80, 34, 63, 20, 27, 36, 98, 97, 7, 6, 9, 65, 91, 93, 2, 27, 83, 35, 65, 17, 26, 41], sum: 1022
chunk: [80, 80, 42, 32, 44, 42, 94, 31, 50, 23, 34, 84, 47, 10, 54, 59, 72, 80, 6, 76], sum: 1040

---

avg sum: 474.6, chunks: 10

chunk: [4, 41, 47, 41, 32, 51, 81, 5, 3, 37, 40, 26, 10, 70], sum: 488
chunk: [54, 8, 91, 42, 35, 80, 13, 84, 14, 23, 59], sum: 503
chunk: [39, 4, 38, 40, 88, 69, 10, 19, 28, 97, 81], sum: 513
chunk: [19, 55, 21, 63, 99, 93, 39, 47, 29], sum: 465
chunk: [65, 88, 12, 94, 7, 47, 14, 55, 28, 9, 98], sum: 517
chunk: [19, 1, 98, 84, 92, 99, 11, 53], sum: 457
chunk: [85, 79, 69, 78, 44, 6, 19, 53], sum: 433
chunk: [59, 20, 64, 55, 2, 65, 44, 90, 37, 26], sum: 462
chunk: [78, 66, 32, 76, 59, 47, 82], sum: 440
chunk: [34, 56, 66, 27, 1, 100, 16, 5, 97, 33, 33], sum: 468

---

avg sum: 182.48, chunks: 25

chunk: [55, 6, 16, 42, 85], sum: 204
chunk: [30, 68, 3, 94], sum: 195
chunk: [68, 96, 23], sum: 187
chunk: [69, 19, 12, 97], sum: 197
chunk: [59, 88, 49], sum: 196
chunk: [1, 16, 13, 12, 61, 77], sum: 180
chunk: [49, 75, 44, 43], sum: 211
chunk: [34, 86, 9, 55], sum: 184
chunk: [25, 82, 12, 93], sum: 212
chunk: [32, 74, 53, 31], sum: 190
chunk: [13, 15, 26, 31, 35, 3, 14, 71], sum: 208
chunk: [81, 92], sum: 173
chunk: [94, 21, 34, 71], sum: 220
chunk: [1, 55, 70, 3, 92], sum: 221
chunk: [38, 59, 56, 57], sum: 210
chunk: [7, 20, 10, 81, 100], sum: 218
chunk: [5, 71, 19, 8, 82], sum: 185
chunk: [95, 14, 72], sum: 181
chunk: [2, 8, 4, 47, 75, 17], sum: 153
chunk: [56, 69, 42], sum: 167
chunk: [75, 45], sum: 120
chunk: [68, 60], sum: 128
chunk: [29, 25, 62, 3, 50], sum: 169
chunk: [54, 63], sum: 117
chunk: [57, 37, 42], sum: 136

正如您所看到的,生成更多的块会使情况变得更糟,这是预期的。我希望我能够稍微帮助一下。

编辑:如果您使用旧版本的Python(低于3.3),则yield from语法不可用,请将该语句转换为普通的for循环。


谢谢这个,但是我所说的边缘情况(持续低估)还是存在这种方法的问题。我添加了一个导致问题的示例数据集,使用此方法实际上会产生[95,15],[75]和[25、85、5],这不是一个坏猜测,但仍然不如[95,15],[75, 25]和[85,5]好。 - Ng Oon-Ee
@NgOon-Ee 嗯,我的解决方案更偏向于提供好的猜测,而不是完美的猜测。我不确定在保持贪婪算法和O(n)复杂度的情况下还能有多大改进。我需要再仔细考虑一下。一个想法是使用我的解决方案获取块,并在块上进行另一次遍历来优化它们,交换首尾元素。如果你需要快速解决,也许你可以试着攻击这个问题。理论上来说,通过对块进行几次额外遍历,你应该可以得到非常好的猜测结果。 - timgeb
@NgOon-Ee请在else子句中尝试这个:chunksum = chunksum - avg_sum + item,而不是chunksum = item。注释/删除更新avg_sum的那一行。对于某些情况,例如[95, 15][75, 25][85, 5]的三分割[95, 15, 75, 25, 85, 5],这似乎可以得到更好的结果。 - timgeb
谢谢,这个解决方案可能是最用户友好的。不幸的是,我越看它,就越意识到它只是推迟了不可避免的错误,主要是因为问题本身对于像Shawn Sullivan所说的单次通过方法来说定义不清楚。我会点赞这个,但基于那个技术细节,我认为他的答案更正确。 - Ng Oon-Ee

4

使用numpy的简单而简洁方法。假设

import numpy.random as nr
import numpy as np

a = (nr.random(10000000)*1000).astype(int)

假设您需要将列表分成 p 部分,使每部分的和大致相等

def equisum_partition(arr,p):
    ac = arr.cumsum()

    #sum of the entire array
    partsum = ac[-1]//p 

    #generates the cumulative sums of each part
    cumpartsums = np.array(range(1,p))*partsum

    #finds the indices where the cumulative sums are sandwiched
    inds = np.searchsorted(ac,cumpartsums) 

    #split into approximately equal-sum arrays
    parts = np.split(arr,inds)

    return parts

重要的是,这是矢量化的:

In [3]: %timeit parts = equisum_partition(a,20)
53.5 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

你可以检查分割的质量,
partsums = np.array([part.sum() for part in parts]).std()

分割不是很理想,但我怀疑它们是最优的,因为顺序没有改变。

1
我认为一个好的方法是对输入列表进行排序。然后将最小值和最大值添加到一个列表中。将第二小和第二大的值添加到下一个列表中,以此类推,直到所有元素都被添加到列表中。
def divide_list(A):
    A.sort()
    l = 0
    r = len(A) - 1
    l1,l2= [],[]
    i = 0
    while l < r:
        ends = [A[l], A[r]]
        if i %2 ==0:
            l1.extend(ends)
        else:
            l2.extend(ends)
        i +=1
        l +=1
        r -=1
    if r == l:
        smaller = l1 if sum(l1) < sum(l2) else l2
        smaller.append(A[r])

    return l1, l2

myList = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print divide_list(myList)

myList = [1,10,10,1]
print divide_list(myList)

输出

([1, 7, 2, 6], [1, 6, 3, 4, 4])
([1, 10], [1, 10])

1
给定数字代表单词/歌词,我认为元素的原始顺序很重要。 - Pynchia
分成三个怎么样? - danidee
4
Op表示顺序很重要。 - erip

1
这可能有点晚了,但我想到了一个函数,可以实现你所需的功能。它需要第二个参数来告诉它应该如何分割列表。
import math

my_list = [1, 6, 2, 3, 4, 1, 7, 6, 4]

def partition(my_list, split):
    solution = []

    total = sum(my_list)
    div = total / split
    div = math.ceil(div)

    criteria = [div] * (total // div)
    criteria.append(total - sum(criteria)) if sum(criteria) != total else criteria

    temp = []
    pivot = 0
    for crit in criteria:
        for count in range(len(my_list) + 1):
            if sum(my_list[pivot:count]) == crit:
                solution.append(my_list[pivot:count])
                pivot = count
                break



    return solution

print(partition(my_list, 2)) # Outputs [[1, 6, 2, 3, 4, 1], [7, 6, 4]]

print(partition(my_list, 3)) # Outputs [[1, 6, 2, 3], [4, 1, 7], [6, 4]]

如果按照您在问题中表述的要求,希望保持顺序,那么对于4个部分,它将失败。

4 divisions = [9, 9, 9, 7]

而且你的顺序不能匹配那个


1
这里有一些返回每个子列表的2个切片索引的代码。
weights = [1, 6, 2, 3, 4, 1, 7, 6, 4]

def balance_partitions(weights:list, n:int=2) -> tuple:
    if n < 1:
        raise ValueError("Parameter 'n' must be 2+")

    target = sum(weights) // n
    results = []
    cost = 0
    start = 0

    for i, w in enumerate(weights):
        delta = target - cost
        cost += w
        if cost >= target:
            if i == 0 or cost - target <= delta:
                results.append( (start, i+1) )
                start = i+1
            elif cost - target > delta:
                # Better if we didn't include this one.
                results.append( (start, i) )
                start = i

            cost -= target

            if len(results) == n-1:
                results.append( (start, len(weights)) )
                break

    return tuple(results)

def print_parts(w, n):
    result = balance_partitions(w, n)
    print("Suggested partition indices: ", result)
    for t in result:
        start,end = t
        sublist = w[start:end]
        print(" - ", sublist, "(sum: {})".format(sum(sublist)))

print(weights, '=', sum(weights))

for i in range(2, len(weights)+1):
    print_parts(weights, i)

输出结果为:

[1, 6, 2, 3, 4, 1, 7, 6, 4] = 34
Suggested partition indices:  ((0, 6), (6, 9))
 -  [1, 6, 2, 3, 4, 1] (sum: 17)
 -  [7, 6, 4] (sum: 17)
Suggested partition indices:  ((0, 4), (4, 7), (7, 9))
 -  [1, 6, 2, 3] (sum: 12)
 -  [4, 1, 7] (sum: 12)
 -  [6, 4] (sum: 10)
Suggested partition indices:  ((0, 3), (3, 5), (5, 7), (7, 9))
 -  [1, 6, 2] (sum: 9)
 -  [3, 4] (sum: 7)
 -  [1, 7] (sum: 8)
 -  [6, 4] (sum: 10)
Suggested partition indices:  ((0, 2), (2, 4), (4, 6), (6, 7), (7, 9))
 -  [1, 6] (sum: 7)
 -  [2, 3] (sum: 5)
 -  [4, 1] (sum: 5)
 -  [7] (sum: 7)
 -  [6, 4] (sum: 10)
Suggested partition indices:  ((0, 2), (2, 3), (3, 5), (5, 6), (6, 7), (7, 9))
 -  [1, 6] (sum: 7)
 -  [2] (sum: 2)
 -  [3, 4] (sum: 7)
 -  [1] (sum: 1)
 -  [7] (sum: 7)
 -  [6, 4] (sum: 10)
Suggested partition indices:  ((0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 9))
 -  [1, 6] (sum: 7)
 -  [2] (sum: 2)
 -  [3] (sum: 3)
 -  [4] (sum: 4)
 -  [1] (sum: 1)
 -  [7] (sum: 7)
 -  [6, 4] (sum: 10)
Suggested partition indices:  ((0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9))
 -  [1, 6] (sum: 7)
 -  [2] (sum: 2)
 -  [3] (sum: 3)
 -  [4] (sum: 4)
 -  [1] (sum: 1)
 -  [7] (sum: 7)
 -  [6] (sum: 6)
 -  [4] (sum: 4)
Suggested partition indices:  ((0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9))
 -  [1] (sum: 1)
 -  [6] (sum: 6)
 -  [2] (sum: 2)
 -  [3] (sum: 3)
 -  [4] (sum: 4)
 -  [1] (sum: 1)
 -  [7] (sum: 7)
 -  [6] (sum: 6)
 -  [4] (sum: 4)

1
这是对@Milind R的numpy方法进行微调的版本(顺便说一声,非常感谢)。我发现,在实际情况下,如果元素在数组中的值不是“均匀”分布的话,脚本建议的分区可能会变得次优。为了解决这个问题,我通过重新排列元素的方式使数组“均匀化”,即将元素按照“最小值”、“最大值”、“第二小值”、“第二大值”等排序。缺点是这使得脚本变得相当慢(约5倍)。
import numpy.random as nr
import numpy as np
a = (nr.random(10000000)*1000).astype(int)

编辑后的分区算法:

def equisum_partition(arr,p, uniformify=True):

    #uniformify: rearrange to ['smallest', 'largest', 'second smallest', 'second largest', etc..]
    if uniformify:
        l = len(arr)
        odd = l%2!=0
        arr = np.sort(arr)

        #add a dummy element if odd length
        if odd:
            arr = np.append(np.min(arr)-1, arr)
            l = l+1

        idx = np.arange(l)
        idx = np.multiply(idx, 
                  np.subtract(1,
                              np.multiply(
                                  np.mod(idx, 2),
                                  2))
                 )
        arr = arr[idx]

        #remove the dummy element
        if odd:
            arr = arr[1:]

    #cumulative summation
    ac = arr.cumsum()

    #sum of the entire array
    partsum = ac[-1]//p 

    #generates the cumulative sums of each part
    cumpartsums = np.array(range(1,p))*partsum

    #finds the indices where the cumulative sums are sandwiched
    inds = np.searchsorted(ac,cumpartsums) 

    #split into approximately equal-sum arrays
    parts = np.split(arr,inds)

    return parts

在原始答案的示例中,由于示例数组的随机性,这并不太重要。
使用uniformify:
%%time
parts = equisum_partition(a,20)
partsums = np.array([part.sum() for part in parts])#
partsums.std()
Wall time: 624 ms
266.6111212984185

没有统一化:

%%time
parts = equisum_partition(a,20, uniformify=False)
partsums = np.array([part.sum() for part in parts])#
partsums.std()
Wall time: 105 ms
331.19071544957296

对我来说最好的解决方案,谢谢! - benjamin berhault

0
这是我在针对两个所需子列表的情况下可能会尝试解决此问题的方法。它可能不够高效,但这是一个初步尝试。
def divide(l):
    total = sum(l)
    half = total / 2
    l1 = []
    l2 = []
    for e in l:
        if half - e >= 0 or half > abs(half - e):
          l1.append(e)
          half -= e
        else:
          l2.append(e)
    return (l1, l2)

你可以在这里看到它的运行效果:

(l1, l2) = divide([1, 6, 2, 3, 4, 1, 7, 6, 4])

print(l1)
# [1, 6, 2, 3, 4, 1]

print(l2)
#[7, 6, 4]

(l1, l2) = divide([1,1,10,10])

print(l1)
# [1, 1, 10]

print(l2)
#[10]

我会把其他情况留给你作为练习。:)


请解释一下为什么要点踩。如果没有反馈,就无法学到任何东西。 - erip
1
我没有给你点踩,但我正在尝试理解这是如何工作的。看起来你会贪心地将元素添加到l1中,直到超过总数的一半。然后再添加到l2中。如果你有一个像[1,1,10,10]这样的列表,那么不会产生[1,1][10,10]的结果吗? - Garrett R
哎呀,确实需要检查下一个元素是否会导致差值减半。很快会更新。 - erip
谢谢,我现在正在使用与此非常相似的东西(除了变量命名和我处理超过两个子列表之外几乎相同),但问题是当数据倾向于提供比预期更小的列表时(我已经添加了一个示例),它往往会超出范围。 - Ng Oon-Ee

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接