用贪心算法在Python中将一个数字列表的列表分成两个分区，使得每个数字的数量相同。

Question

用贪心算法在Python中将一个数字列表的列表分成两个分区，使得每个数字的数量相同。

3

我有一个包含随机正整数的子列表的列表。这个列表由 3 个参数控制：

max_num: 每个子列表中允许的最大整数，例如如果 max_num=3，那么列表将如下所示： [[1,3], [3], [1,2,3], [1], ...]；
max_length: 每个子列表中的最大整数个数；
n_gen: 生成的子列表数量，即列表的长度。

您可以使用以下代码生成此类列表：

import random

random.seed(10)
def random_numbers(length, max_num):
    return [random.randint(1, max_num) for _ in range(length)]

max_num = 3
max_length = 3 # I want max_length=10
n_gen = 10 # I want n_gen=200

lst = [random_numbers(random.randint(1, max_length), max_num) for _ in range(n_gen)]

现在我想把这个列表分成两个部分，每一部分都有相同数量的每个数字。例如，如果lst = [[1,2,3]，[2,3]，[1,3]，[3]]，其中一个解决方案是bipartition = [[[1,2,3]，[3]]，[[2,3]，[1,3]]]。

我成功地编写了以下暴力枚举所有可能的二分法，对于小参数而言它运行良好。

from itertools import product

lst1 = []
lst2 = []
for pattern in product([True, False], repeat=len(lst)):
    lst1.append([x[1] for x in zip(pattern, lst) if x[0]])
    lst2.append([x[1] for x in zip(pattern, lst) if not x[0]])

bipartitions = []
for l1, l2 in zip(lst1, lst2):
    flat1 = [i for l in l1 for i in l]
    flat2 = [i for l in l2 for i in l]
    if sorted(flat1) == sorted(flat2):
        bipartitions.append([l1, l2])

for bipartition in bipartitions:
    print(bipartition)

输出:

[[[1, 2, 2], [1, 1, 2], [2, 3], [3, 2]], [[1], [2, 2, 1], [3], [1, 2], [3], [2, 2]]]
[[[1, 2, 2], [1, 1, 2], [3], [3], [2, 2]], [[2, 3], [1], [2, 2, 1], [1, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]], [[1, 1, 2], [1, 2], [3], [2, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]], [[1, 1, 2], [3], [1, 2], [2, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [1, 2], [3, 2]], [[1, 1, 2], [2, 2, 1], [3], [3], [2, 2]]]
[[[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]], [[1, 1, 2], [2, 3], [1, 2], [3], [2, 2]]]
[[[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]], [[1, 1, 2], [2, 3], [3], [1, 2], [2, 2]]]
[[[1, 2, 2], [1], [3], [1, 2], [3], [2, 2]], [[1, 1, 2], [2, 3], [2, 2, 1], [3, 2]]]
[[[1, 2, 2], [2, 2, 1], [3], [1, 2], [3]], [[1, 1, 2], [2, 3], [1], [2, 2], [3, 2]]]
[[[1, 1, 2], [2, 3], [1], [2, 2], [3, 2]], [[1, 2, 2], [2, 2, 1], [3], [1, 2], [3]]]
[[[1, 1, 2], [2, 3], [2, 2, 1], [3, 2]], [[1, 2, 2], [1], [3], [1, 2], [3], [2, 2]]]
[[[1, 1, 2], [2, 3], [3], [1, 2], [2, 2]], [[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]]]
[[[1, 1, 2], [2, 3], [1, 2], [3], [2, 2]], [[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]]]
[[[1, 1, 2], [2, 2, 1], [3], [3], [2, 2]], [[1, 2, 2], [2, 3], [1], [1, 2], [3, 2]]]
[[[1, 1, 2], [3], [1, 2], [2, 2], [3, 2]], [[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]]]
[[[1, 1, 2], [1, 2], [3], [2, 2], [3, 2]], [[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]]]
[[[2, 3], [1], [2, 2, 1], [1, 2], [3, 2]], [[1, 2, 2], [1, 1, 2], [3], [3], [2, 2]]]
[[[1], [2, 2, 1], [3], [1, 2], [3], [2, 2]], [[1, 2, 2], [1, 1, 2], [2, 3], [3, 2]]]

然而，当参数变得更大时，这将变得不可行。现在我想生成具有相同数量的每个数字的随机二分图，我猜贪心算法可以解决。对于我的当前任务，我需要使用

max_num = 3
max_length = 10
n_gen = 200

有什么建议吗？

编辑：我知道有些情况根本不可能进行这种二分法。我的想法是，在贪心算法建议的二分法在最大建议次数（例如，如果足够快，则为1000）后，我们应该相信不存在这样的二分法。当参数很大时，即使检查这种二分法是否存在也将是不可行的。

- Shaun Han

1

@aneroid 1. 我已经意识到了。我的想法是：如果贪心算法建议最大数量（例如1000，如果足够快）的二分图没有满足要求的二分图，则应将其视为“没有这样的二分图”。 2. 是的，现在我想生成一个有效的二分图，但它必须快速且随机，并且我想多次生成这样的随机二分图。 - Shaun Han

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Peatherfed · Accepted Answer

哇，这个确实很难。首先，让我说一个显而易见的事实，贪心算法是确定性的，因为它总是选择最优路径。其次，能够将某物二分的可能性非常小，非常小。我建议如果你想要生成二分图，从像这样的随机集合中找到它们并不是一个好主意。

无论如何，让我们来看看代码。首先，我要说的是，代码不太美观，也不完全优化。在那里面，我甚至没有按照Pythonic的方式编写代码，但这些都很容易修复。我已经在这上面花了几个小时，但这是一个有趣的项目。列表的复制是主要嫌疑人。您可以在自己的时间内重新编写和优化它。我也不能保证它是没有错误的，但我相信它是没有问题的。唯一的例外是，如果您想要任何结果，需要确保它至少进行一次“小心”的搜索。这就引出了下一个问题，算法本身。

我们首先执行一个相当标准的贪心算法。我们从我们的partitionee中选择一个索引，并将其指定给左边的二分图。接下来，我们查看所有插入所有剩余列表的可能方法。我们选择使我们最接近0的那个。我们重复此过程，直到我们遇到某个断点，之后我们切换到您的穷举算法。

现在，对于高值常量，我们很有可能找不到分区。我认为这只是一个统计问题，而不是算法的问题，但我可能错了。

我还实现了一个粗略的可行性测试，你会很快发现，大约90%的所有随机生成的嵌套列表可以立即被舍弃，因为它们无法二分。

然而，贪心算法的添加现在允许我在我的机器上从测试大约15个长度的分区到大约30个长度的分区，并成功地找到一个。它也在不到1/10秒的时间内运行，例如3、3、40、12作为其常量。

最后，这里是代码。请注意，我只让它生成一个分区进行测试，所以您可能需要运行它几次，才能得到一个可行的分区：

from itertools import product
import random
import datetime
import time
import sys

MAX_NUM = 3
MAX_LEN = 3
NUM_GEN = 200
NSWITCH = 12

random.seed(time.time())

def feasible(partitionee):
    '''Does a rough test to see if it is feasible to find a partition'''
    counts = [0 for _ in range(MAX_NUM)]
    flat = [x for sublist in partitionee for x in sublist]
    for n in flat:
        counts[n-1] += 1
    for n in counts:
        if n % 2 != 0:
            return False
    return True 

def random_numbers(length, max_num, n_lists):
    '''Create a random list of lists of numbers.'''

    lst = []
    for i in range(n_lists):
        sublist_length = random.randint(1, length)
        lst.append([random.randint(1, max_num) for _ in range(sublist_length)])
    return lst


def diff(lcounts, rcounts):
    '''Calculate the difference between the counts in our dictionaries.'''

    difference = 0
    for i in range(MAX_NUM):
        difference += abs(lcounts[i] - rcounts[i])

    return difference


def assign(partition, d, sublist):
    '''Assign a sublist to a partition, and update its dictionary.'''

    partition.append(sublist)
    for n in sublist:
        d[n-1] += 1


def assign_value(d1, d2, sublist):
    '''Calculates the loss of assigning sublist.'''

    for n in sublist:
        d1[n-1] += 1
    left_score = diff(d1, d2)
    for n in sublist:
        d1[n-1] -= 1
        d2[n-1] += 1
    right_score = diff(d1, d2)
    for n in sublist:
        d2[n-1] -= 1

    return (left_score, right_score)


def greedy_partition(left, right, lcounts, rcounts, i, partitionee):
    # Assign the i:th sublist to the left partition.
    assign(left, lcounts, partitionee[i])
    del partitionee[i]

    for _ in range(NUM_GEN - NSWITCH):
        # Go through all unassigned sublists and get their loss.
        value_for_index = {}
        for i, sublist in enumerate(partitionee):
            value = assign_value(lcounts, rcounts, sublist)
            value_for_index[i]  = value

        # Find which choice would be closest to 0 difference.
        min_value    = 100000000000 # BIG NUMBER
        best_index  = -1
        choose_left = True
        for key, value in value_for_index.items():
            if min(value) < min_value:
                min_value    = min(value)
                choose_left = value[0] < value[1]
                best_index  = key

        # Assign it to the proper list.
        if choose_left:
            assign(left, lcounts, partitionee[best_index])
        else:
            assign(right, rcounts, partitionee[best_index])
        del partitionee[best_index]

    return diff(lcounts, rcounts)



# Create our list to partition.
partition_me = random_numbers(MAX_LEN, MAX_NUM, NUM_GEN)

start_time = datetime.datetime.now()

# Start by seeing if it's even feasible to partition.
if not feasible(partition_me):
    print('No bipartition possible!')
    sys.exit()


# Go through all possible starting arrangements.
min_score_seen = 100000000000 # BIG NUMBER
best_bipartition = []
for i in range(NUM_GEN):
    # Create left and right partitions, as well as maps to count how many of each
    # number each partition has accumulated.
    left  = []
    right = []
    lcounts  = [0 for i in range(MAX_NUM)]
    rcounts  = [0 for i in range(MAX_NUM)]

    # Copy partitionee since it will be consumed.
    partition = partition_me.copy()

    # Do greedy partition.
    score = greedy_partition(left, right, lcounts, rcounts, i, partition)
    if score < min_score_seen:
        min_score_seen = score
        best_bipartition = [left] + [right]

# Now that we've been greedy and fast, we will be careful and slow.
# Consider all possible remaining arrangements.
print('Done with greedy search, starting careful search.')
left = best_bipartition[0]
right = best_bipartition[1]

for pattern in product([True, False], repeat=len(partition)):
    lst1 = left  + ([x[1] for x in zip(pattern, partition) if x[0]])
    lst2 = right +([x[1] for x in zip(pattern, partition) if not x[0]])
    left_flat  = [x for sublist in lst1 for x in sublist]
    right_flat  = [x for sublist in lst2 for x in sublist]
    if sorted(left_flat) == sorted(right_flat):
        print('Found bipartition by careful search:')
        print([lst1] + [lst2])
        break

end_time = datetime.datetime.now()
print('Time taken: ', end='')
print(end_time - start_time)