我有一个嵌套的长列表。每个子列表包含2个元素。我想做的是遍历整个列表,并在找到第一个元素超过3次后删除子列表。
示例:
ls = [[1,1], [1,2], [1,3], [1,4], [2,2], [2,3], [3,4], [3,5], [3,6], [3,7]]
desired_result = [[1,1], [1,2], [1,3], [2,2], [2,3], [3,4], [3,5], [3,6]]
我有一个嵌套的长列表。每个子列表包含2个元素。我想做的是遍历整个列表,并在找到第一个元素超过3次后删除子列表。
示例:
ls = [[1,1], [1,2], [1,3], [1,4], [2,2], [2,3], [3,4], [3,5], [3,6], [3,7]]
desired_result = [[1,1], [1,2], [1,3], [2,2], [2,3], [3,4], [3,5], [3,6]]
from itertools import groupby, islice
from operator import itemgetter
ls = [[1, 1], [1, 2], [1, 3], [1, 4], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6], [3, 7]]
result = [e for _, group in groupby(ls, key=itemgetter(0)) for e in islice(group, 3)]
print(result)
输出
[[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6]]
这个想法是使用groupby
按第一个值对元素进行分组,然后使用islice
获取前三个值(如果存在)。
可能不是最短的答案。
这个想法是在迭代ls
时计算出现次数。
from collections import defaultdict
filtered_ls = []
counter = defaultdict(int)
for l in ls:
counter[l[0]] += 1
if counter[l[0]] > 3:
continue
filtered_ls += [l]
print(filtered_ls)
# [[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6]]
itertools.groupby
然后只保留每个组的前三个项目。>>> import itertools
>>> ls = [[1,1], [1,2], [1,3], [1,4], [2,2], [2,3], [3,4], [3,5], [3,6], [3,7]]
>>> list(itertools.chain.from_iterable(list(g)[:3] for _,g in itertools.groupby(ls, key=lambda i: i[0])))
[[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6]]
ls = [[1,1], [1,2], [1,3], [1,4], [2,2], [2,3], [3,4], [3,5], [3,6], [3,7]]
val_count = dict.fromkeys(set([i[0] for i in ls]), 0)
new_ls = []
for i in ls:
if val_count[i[0]] < 3:
val_count[i[0]] += 1
new_ls.append(i)
print(new_ls)
输出:
[[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6]]
您可以使用collections.defaultdict
在O(n)时间内按第一个值进行聚合。然后使用itertools.chain
构建一个列表的列表。
from collections import defaultdict
from itertools import chain
dd = defaultdict(list)
for key, val in ls:
if len(dd[key]) < 3:
dd[key].append([key, val])
res = list(chain.from_iterable(dd.values()))
print(res)
# [[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6]]
Ghillas BELHADJ的回答很好。但是你应该考虑使用defaultdict来完成这个任务。这个想法来自于Raymond Hettinger,他建议在分组和计数任务中使用defaultdict。
from collections import defaultdict
def remove_sub_lists(a_list, nth_occurence):
found = defaultdict(int)
for sublist in a_list:
first_index = sublist[0]
print(first_index)
found[first_index] += 1
if found[first_index] <= nth_occurence:
yield sublist
max_3_times_first_index = list(remove_sub_lists(ls, 3)))
countDict = {}
for i in ls:
if str(i[0]) not in countDict.keys():
countDict[str(i[0])] = 1
else:
countDict[str(i[0])] += 1
if countDict[str(i[0])] > 3:
ls.remove(i)