我有一个列表的字典,每个列表都是一组数字序列。没有两个列表是相同的,但是两个或更多的列表可能以相同的数字序列开头(见下面的示例输入)。我想做的是找到这些共同的序列,并将它们作为字典中的新元素。
示例输入:
每个列表的关键在于其最后一个元素。顺序无所谓。
我有一段可用的代码,但它很混乱。能否有人建议一个更简单/更清晰的解决方案?
示例输入:
sequences = {
18: [1, 3, 5, 6, 8, 12, 15, 17, 18],
19: [1, 3, 5, 6, 9, 13, 14, 16, 19],
25: [1, 3, 5, 6, 9, 13, 14, 20, 25],
11: [0, 2, 4, 7, 11],
20: [0, 2, 4, 10, 20],
26: [21, 23, 26],
}
示例输出:
expected_output = {
6: [1, 3, 5, 6],
18: [8, 12, 15, 17, 18],
14: [9, 13, 14],
19: [16, 19],
25: [20, 25],
4: [0, 2, 4],
11: [7, 11],
20: [10, 20],
26: [21, 23, 26],
}
每个列表的关键在于其最后一个元素。顺序无所谓。
我有一段可用的代码,但它很混乱。能否有人建议一个更简单/更清晰的解决方案?
from collections import Counter
def split_lists(sequences):
# get first elem from each sequence
firsts = list(map(lambda s: s[0], sequences))
# get non-duplicate first elements
not_duplicates = list(map(lambda c: c[0], filter(lambda c: c[1] == 1, Counter(firsts).items())))
# start the new_sequences with the non-duplicate lists
new_sequences = dict(map(lambda s: (s[-1], s), filter(lambda s: s[0] in not_duplicates, sequences)))
# get duplicate first elements
duplicates = list(map(lambda c: c[0], filter(lambda c: c[1] > 1, Counter(firsts).items())))
for duplicate in duplicates:
# get all lists that start with the duplicate element
duplicate_lists = list(filter(lambda s: s[0] == duplicate, sequences))
# get the common elements from the duplicate lists and make it a new
# list to add to our new_sequences dict
repeated_sequence = sorted(list(set.intersection(*list(map(set, duplicate_lists)))))
new_sequences[repeated_sequence[-1]] = repeated_sequence
# get lists from where I left of
i = len(repeated_sequence)
sub_lists = list(filter(lambda s: len(s) > 0, map(lambda s: s[i:], duplicate_lists)))
# recursively split them and store them in new_sequences
new_sequences.update(split_lists(sub_lists))
return new_sequences
另外,你能帮我确定我算法的复杂度吗?递归让我头晕。 我最好的猜测是 O(n*m)
,其中n
是列表数量,m
是最长列表的长度。
18: [8, 12, 15, 17, 18]
? - Reut Sharabani