如何在保留列表原始顺序的同时删除重复项,并记住列表中任何项的第一个索引?
例如,从[1, 1, 2, 3]
中删除重复项得到[1, 2, 3]
,但我需要记住索引[0, 2, 3]
。
我正在使用Python 2.7。
如何在保留列表原始顺序的同时删除重复项,并记住列表中任何项的第一个索引?
例如,从[1, 1, 2, 3]
中删除重复项得到[1, 2, 3]
,但我需要记住索引[0, 2, 3]
。
我正在使用Python 2.7。
我会采用不同的方法,使用一个OrderedDict
和列表的index
方法返回项的最低索引。
>>> from collections import OrderedDict
>>> lst = [1, 1, 2, 3]
>>> d = OrderedDict((x, lst.index(x)) for x in lst)
>>> d
OrderedDict([(1, 0), (2, 2), (3, 3)]
如果您需要分别获取去重后的列表和索引,您可以使用以下命令:
>>> d.keys()
[1, 2, 3]
>>> d.values()
[0, 2, 3]
enumerate
来跟踪索引,使用集合来跟踪已经出现的元素:l = [1, 1, 2, 3]
inds = []
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
inds.append(i)
seen.add(ele)
inds = []
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
inds.append((i,ele))
seen.add(ele)
或者如果你想要将它们分别放在不同的列表中:
l = [1, 1, 2, 3]
inds, unq = [],[]
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
inds.append(i)
unq.append(ele)
seen.add(ele)
使用集合是到目前为止最好的方法:
In [13]: l = [randint(1,10000) for _ in range(10000)]
In [14]: %%timeit
inds = []
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
inds.append((i,ele))
seen.add(ele)
....:
100 loops, best of 3: 3.08 ms per loop
In [15]: timeit OrderedDict((x, l.index(x)) for x in l)
1 loops, best of 3: 442 ms per loop
In [16]: l = [randint(1,10000) for _ in range(100000)]
In [17]: timeit OrderedDict((x, l.index(x)) for x in l)
1 loops, best of 3: 10.3 s per loop
In [18]: %%timeit
inds = []
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
inds.append((i,ele))
seen.add(ele)
....:
10 loops, best of 3: 22.6 ms per loop
def yield_un(l):
seen = set()
for i, ele in enumerate(l):
if ele not in seen:
yield (i,ele)
seen.add(ele)