用贪心算法在Python中将一个数字列表的列表分成两个分区,使得每个数字的数量相同。

3
我有一个包含随机正整数的子列表的列表。这个列表由 3 个参数控制:
  1. max_num: 每个子列表中允许的最大整数,例如如果 max_num=3,那么列表将如下所示: [[1,3], [3], [1,2,3], [1], ...]
  2. max_length: 每个子列表中的最大整数个数;
  3. n_gen: 生成的子列表数量,即列表的长度。
您可以使用以下代码生成此类列表:
import random

random.seed(10)
def random_numbers(length, max_num):
    return [random.randint(1, max_num) for _ in range(length)]

max_num = 3
max_length = 3 # I want max_length=10
n_gen = 10 # I want n_gen=200

lst = [random_numbers(random.randint(1, max_length), max_num) for _ in range(n_gen)]

现在我想把这个列表分成两个部分,每一部分都有相同数量的每个数字。 例如,如果lst = [[1,2,3],[2,3],[1,3],[3]],其中一个解决方案是bipartition = [[[1,2,3],[3]],[[2,3],[1,3]]]
我成功地编写了以下暴力枚举所有可能的二分法,对于小参数而言它运行良好。
from itertools import product

lst1 = []
lst2 = []
for pattern in product([True, False], repeat=len(lst)):
    lst1.append([x[1] for x in zip(pattern, lst) if x[0]])
    lst2.append([x[1] for x in zip(pattern, lst) if not x[0]])

bipartitions = []
for l1, l2 in zip(lst1, lst2):
    flat1 = [i for l in l1 for i in l]
    flat2 = [i for l in l2 for i in l]
    if sorted(flat1) == sorted(flat2):
        bipartitions.append([l1, l2])

for bipartition in bipartitions:
    print(bipartition)

输出:

[[[1, 2, 2], [1, 1, 2], [2, 3], [3, 2]], [[1], [2, 2, 1], [3], [1, 2], [3], [2, 2]]]
[[[1, 2, 2], [1, 1, 2], [3], [3], [2, 2]], [[2, 3], [1], [2, 2, 1], [1, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]], [[1, 1, 2], [1, 2], [3], [2, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]], [[1, 1, 2], [3], [1, 2], [2, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [1, 2], [3, 2]], [[1, 1, 2], [2, 2, 1], [3], [3], [2, 2]]]
[[[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]], [[1, 1, 2], [2, 3], [1, 2], [3], [2, 2]]]
[[[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]], [[1, 1, 2], [2, 3], [3], [1, 2], [2, 2]]]
[[[1, 2, 2], [1], [3], [1, 2], [3], [2, 2]], [[1, 1, 2], [2, 3], [2, 2, 1], [3, 2]]]
[[[1, 2, 2], [2, 2, 1], [3], [1, 2], [3]], [[1, 1, 2], [2, 3], [1], [2, 2], [3, 2]]]
[[[1, 1, 2], [2, 3], [1], [2, 2], [3, 2]], [[1, 2, 2], [2, 2, 1], [3], [1, 2], [3]]]
[[[1, 1, 2], [2, 3], [2, 2, 1], [3, 2]], [[1, 2, 2], [1], [3], [1, 2], [3], [2, 2]]]
[[[1, 1, 2], [2, 3], [3], [1, 2], [2, 2]], [[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]]]
[[[1, 1, 2], [2, 3], [1, 2], [3], [2, 2]], [[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]]]
[[[1, 1, 2], [2, 2, 1], [3], [3], [2, 2]], [[1, 2, 2], [2, 3], [1], [1, 2], [3, 2]]]
[[[1, 1, 2], [3], [1, 2], [2, 2], [3, 2]], [[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]]]
[[[1, 1, 2], [1, 2], [3], [2, 2], [3, 2]], [[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]]]
[[[2, 3], [1], [2, 2, 1], [1, 2], [3, 2]], [[1, 2, 2], [1, 1, 2], [3], [3], [2, 2]]]
[[[1], [2, 2, 1], [3], [1, 2], [3], [2, 2]], [[1, 2, 2], [1, 1, 2], [2, 3], [3, 2]]]

然而,当参数变得更大时,这将变得不可行。现在我想生成具有相同数量的每个数字的随机二分图,我猜贪心算法可以解决。对于我的当前任务,我需要使用

max_num = 3
max_length = 10
n_gen = 200

有什么建议吗?

编辑:我知道有些情况根本不可能进行这种二分法。我的想法是,在贪心算法建议的二分法在最大建议次数(例如,如果足够快,则为1000)后,我们应该相信不存在这样的二分法。当参数很大时,即使检查这种二分法是否存在也将是不可行的。


1
@aneroid 1. 我已经意识到了。我的想法是:如果贪心算法建议最大数量(例如1000,如果足够快)的二分图没有满足要求的二分图,则应将其视为“没有这样的二分图”。 2. 是的,现在我想生成一个有效的二分图,但它必须快速且随机,并且我想多次生成这样的随机二分图。 - Shaun Han
1个回答

2

哇,这个确实很难。首先,让我说一个显而易见的事实,贪心算法是确定性的,因为它总是选择最优路径。其次,能够将某物二分的可能性非常小,非常小。我建议如果你想要生成二分图,从像这样的随机集合中找到它们并不是一个好主意。

无论如何,让我们来看看代码。首先,我要说的是,代码不太美观,也不完全优化。在那里面,我甚至没有按照Pythonic的方式编写代码,但这些都很容易修复。我已经在这上面花了几个小时,但这是一个有趣的项目。列表的复制是主要嫌疑人。您可以在自己的时间内重新编写和优化它。我也不能保证它是没有错误的,但我相信它是没有问题的。唯一的例外是,如果您想要任何结果,需要确保它至少进行一次“小心”的搜索。这就引出了下一个问题,算法本身。

我们首先执行一个相当标准的贪心算法。我们从我们的partitionee中选择一个索引,并将其指定给左边的二分图。接下来,我们查看所有插入所有剩余列表的可能方法。我们选择使我们最接近0的那个。我们重复此过程,直到我们遇到某个断点,之后我们切换到您的穷举算法。

现在,对于高值常量,我们很有可能找不到分区。我认为这只是一个统计问题,而不是算法的问题,但我可能错了。

我还实现了一个粗略的可行性测试,你会很快发现,大约90%的所有随机生成的嵌套列表可以立即被舍弃,因为它们无法二分。

然而,贪心算法的添加现在允许我在我的机器上从测试大约15个长度的分区到大约30个长度的分区,并成功地找到一个。它也在不到1/10秒的时间内运行,例如3、3、40、12作为其常量。

最后,这里是代码。请注意,我只让它生成一个分区进行测试,所以您可能需要运行它几次,才能得到一个可行的分区:

from itertools import product
import random
import datetime
import time
import sys

MAX_NUM = 3
MAX_LEN = 3
NUM_GEN = 200
NSWITCH = 12

random.seed(time.time())

def feasible(partitionee):
    '''Does a rough test to see if it is feasible to find a partition'''
    counts = [0 for _ in range(MAX_NUM)]
    flat = [x for sublist in partitionee for x in sublist]
    for n in flat:
        counts[n-1] += 1
    for n in counts:
        if n % 2 != 0:
            return False
    return True 

def random_numbers(length, max_num, n_lists):
    '''Create a random list of lists of numbers.'''

    lst = []
    for i in range(n_lists):
        sublist_length = random.randint(1, length)
        lst.append([random.randint(1, max_num) for _ in range(sublist_length)])
    return lst


def diff(lcounts, rcounts):
    '''Calculate the difference between the counts in our dictionaries.'''

    difference = 0
    for i in range(MAX_NUM):
        difference += abs(lcounts[i] - rcounts[i])

    return difference


def assign(partition, d, sublist):
    '''Assign a sublist to a partition, and update its dictionary.'''

    partition.append(sublist)
    for n in sublist:
        d[n-1] += 1


def assign_value(d1, d2, sublist):
    '''Calculates the loss of assigning sublist.'''

    for n in sublist:
        d1[n-1] += 1
    left_score = diff(d1, d2)
    for n in sublist:
        d1[n-1] -= 1
        d2[n-1] += 1
    right_score = diff(d1, d2)
    for n in sublist:
        d2[n-1] -= 1

    return (left_score, right_score)


def greedy_partition(left, right, lcounts, rcounts, i, partitionee):
    # Assign the i:th sublist to the left partition.
    assign(left, lcounts, partitionee[i])
    del partitionee[i]

    for _ in range(NUM_GEN - NSWITCH):
        # Go through all unassigned sublists and get their loss.
        value_for_index = {}
        for i, sublist in enumerate(partitionee):
            value = assign_value(lcounts, rcounts, sublist)
            value_for_index[i]  = value

        # Find which choice would be closest to 0 difference.
        min_value    = 100000000000 # BIG NUMBER
        best_index  = -1
        choose_left = True
        for key, value in value_for_index.items():
            if min(value) < min_value:
                min_value    = min(value)
                choose_left = value[0] < value[1]
                best_index  = key

        # Assign it to the proper list.
        if choose_left:
            assign(left, lcounts, partitionee[best_index])
        else:
            assign(right, rcounts, partitionee[best_index])
        del partitionee[best_index]

    return diff(lcounts, rcounts)



# Create our list to partition.
partition_me = random_numbers(MAX_LEN, MAX_NUM, NUM_GEN)

start_time = datetime.datetime.now()

# Start by seeing if it's even feasible to partition.
if not feasible(partition_me):
    print('No bipartition possible!')
    sys.exit()


# Go through all possible starting arrangements.
min_score_seen = 100000000000 # BIG NUMBER
best_bipartition = []
for i in range(NUM_GEN):
    # Create left and right partitions, as well as maps to count how many of each
    # number each partition has accumulated.
    left  = []
    right = []
    lcounts  = [0 for i in range(MAX_NUM)]
    rcounts  = [0 for i in range(MAX_NUM)]

    # Copy partitionee since it will be consumed.
    partition = partition_me.copy()

    # Do greedy partition.
    score = greedy_partition(left, right, lcounts, rcounts, i, partition)
    if score < min_score_seen:
        min_score_seen = score
        best_bipartition = [left] + [right]

# Now that we've been greedy and fast, we will be careful and slow.
# Consider all possible remaining arrangements.
print('Done with greedy search, starting careful search.')
left = best_bipartition[0]
right = best_bipartition[1]

for pattern in product([True, False], repeat=len(partition)):
    lst1 = left  + ([x[1] for x in zip(pattern, partition) if x[0]])
    lst2 = right +([x[1] for x in zip(pattern, partition) if not x[0]])
    left_flat  = [x for sublist in lst1 for x in sublist]
    right_flat  = [x for sublist in lst2 for x in sublist]
    if sorted(left_flat) == sorted(right_flat):
        print('Found bipartition by careful search:')
        print([lst1] + [lst2])
        break

end_time = datetime.datetime.now()
print('Time taken: ', end='')
print(end_time - start_time)


该死,这真的是一项艰巨的工作。我非常感谢你的努力。我运行了你的代码(没有改变任何东西)30次。只有6次是可行的,但在仔细搜索后都找不到二分图。这正常吗?说实话,我认为二分图比这更有可能,特别是对于一个大的“NUM_GEN”和一个小的“MAX_NUM”。你有一个有效的种子吗?另外,“NSWITCH”是做什么的? - Shaun Han
首先,关于可行性。这种情况非常正常。要能够将列表的列表二分,必须在列表中拥有每个数字的偶数个数(忽略嵌套)。对于大的MAX_NUM来说,这是不太可能的。至于完全找不到二分法,你得到了和我一样的结果。我在原始答案中谈到了一些关于它的内容。我认为这是因为二分法很少见,但如果你确定它们不是,那么也许贪心算法并不擅长找到它们。NSWITCH确定何时应该切换搜索方法。 - Peatherfed

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接