我正在尝试从一个列表中提取群组,该主列表包含不同长度的列表。 我想高效地将所有包含另一个子列表中至少一个值的子列表进行分组。例如,我希望得到以下结果:
[[2],[5],[5,8,16],[7,9,12],[9,20]]
被分组成这样
my_groups = {"group1":[2], "group2":[5,8,16], "group3":[7,9,12,20]}
我想做的方法是将子列表转换为集合,然后使用reduce intersect1d。
reduce(np.intersect1d, (SUPERLIST))
我希望能将结果放入一个字典中。但是我不知道如何在不遍历列表的情况下将其转化为一组集合的数组。
有没有一种方法可以做到这一点,或者我忽略了更有效率的方法?
编辑
如果没有numpy,我会像这样做:
my_dict = dict()
unique_id = 0
for sub_list_ref in super_list:
sub_set_ref = set(sub_list_ref)
for sub_list_comp in super_list:
sub_set_comp = set(sub_list_comp)
if len(sub_set_ref.intersection(sub_set_comp))>0:
my_dict[unique_id] = sub_set_ref.union(sub_set_comp)
updated_id = unique_id+1
if updated_id == unique_id:
my_dict[unique_id] = sub_list_ref
else:
unique_id = updated_id
编辑2
昨天DarryIG给出了一份非常好的答案,今天我成功地制作了一个更高效的版本。
def coalesce_groups(current_list, super_list, iterations):
if iterations <= 0:
print("YOU HAVE ITERATED MORE THAN 3 TIMES")
tmp_list = current_list
for element in current_list:
tmp_list = tmp_list + super_list[element]
# Take only the unique elements
tmp_list = list(set(tmp_list))
if tmp_list == list(set(current_list)):
return tmp_list
else:
iterations-=1
return coalesce_groups(tmp_list, super_list, iterations)
def dict_of_groups(original_list):
lst = list(original_list).copy()
result_list = []
for it in lst:
result = coalesce_groups(it, lst, iterations = 3)
if len(result)!=0:
result_list.append(result)
for t in result:
lst[t] = []
result_dict = { x : result_list[x] for x in range(0, len(result_list) ) }
return result_dict
在测试中(在jupyter笔记本上)
lst = [[0], [1], [2], [3], [4], [5], [16, 6], [8, 10, 18, 21, 27, 7], [10, 19, 21, 27, 40, 8], [13, 20, 22, 26, 28, 30, 34, 41, 42, 50, 9], [18, 21, 27, 10], [11], [12], [20, 22, 26, 28, 30, 34, 41, 42, 50, 13], [14], [15], [16], [25, 17], [21, 27, 40, 18], [21, 27, 40, 19], [22, 26, 28, 30, 34, 41, 42, 50, 20], [27, 40, 21], [26, 28, 30, 34, 41, 42, 50, 22], [23], [24], [25], [28, 30, 34, 41, 42, 50, 26], [40, 27], [30, 34, 41, 42, 50, 28], [29], [34, 41, 42, 50, 30], [33, 31], [32], [33], [41, 42, 50, 34], [35], [36], [37], [38], [39], [40], [42, 50, 41], [50, 42], [43], [44], [45], [46], [47], [49, 48], [49], [50]]
%lprun -T lprof0 -f generate_groups generate_groups(lst)
print(open('lprof0', 'r').read())
%lprun -T lprof1 -f dict_of_groups dict_of_groups(lst)
print(open('lprof1', 'r').read())
虽然其他回答也会被考虑,但我认为他的回答仍然有效且更全面。
DarryIG仍然是王者。
(Note: I kept the original HTML tags and translated the content into simplified Chinese.)
set
对象放入numpy.ndarray
对象中呢?这似乎毫无意义。 - juanpa.arrivillaga