我知道在Python中,我们可以使用set函数来查找列表中是否有重复项。但是我在想,是否可以在不使用set的情况下查找列表中的重复项。
比如说,我的列表是:
a=['1545','1254','1545']
那么如何找到重复项呢?
我知道在Python中,我们可以使用set函数来查找列表中是否有重复项。但是我在想,是否可以在不使用set的情况下查找列表中的重复项。
比如说,我的列表是:
a=['1545','1254','1545']
a=['1545','1254','1545']
from collections import Counter
print [item for item, count in Counter(a).items() if count != 1]
输出
['1545']
这个解决方案的运行时间为O(N),如果使用的列表元素很多,这将是一个巨大的优势。
如果你只想找出列表中是否有重复项,可以简单地执行以下操作:
a=['1545','1254','1545']
from collections import Counter
print any(count != 1 for count in Counter(a).values())
正如@gnibbler建议的那样,这将是实际上最快的解决方案
from collections import defaultdict
def has_dup(a):
result = defaultdict(int)
for item in a:
result[item] += 1
if result[item] > 1:
return True
else:
return False
a=['1545','1254','1545']
print has_dup(a)
Counter(a).most_common()
会先返回出现频率高的元素。该方法还可以接受可选参数 n
(数字)。any(count != 1 for count in Counter(a).values())
可以替换为 any(count != 1 for count in Counter(a).most_common(1))
,甚至更短:Counter(a).most_common(1)[0][1] > 1
(假设 a
不为空) - falsetrumost_common
,它应该在内部进行排序,对吧?这样就变成了O(NlogN) :( - thefourtheyeCounter
内部使用堆队列,但实际上它并没有这样做。 - falsetrudefaultdict
版本 :) - thefourtheyelist.count
:In [309]: a=['1545','1254','1545']
...: a.count('1545')>1
Out[309]: True
list.count
:>>> a = ['1545','1254','1545']
>>> any(a.count(x) > 1 for x in a) # To check whether there's any duplicate
True
>>> # To retrieve any single element that is duplicated
>>> next((x for x in a if a.count(x) > 1), None)
'1545'
# To get duplicate elements (used set literal!)
>>> {x for x in a if a.count(x) > 1}
set(['1545'])
a.sort()
last_x = None
for x in a:
if x == last_x:
print "duplicate: %s" % x
break # existence of duplicates is enough
last_x = x
itertools.groupby
进行简化。 - John La Rooy>>> lis = []
>>> a=['1545','1254','1545']
>>> for i in a:
... if i not in lis:
... lis.append(i)
...
>>> lis
['1545', '1254']
>>> set(a)
set(['1254', '1545'])
set
:) - thefourtheye>>> a = ['1545','1254','1545']
>>> D = {}
>>> for i in a:
... if i in D:
... print "duplicate", i
... break
... D[i] = i
... else:
... print "no duplicate"
...
duplicate 1545
>>> from itertools import groupby
>>> a = ['1545','1254','1545']
>>> next(k for k, g in groupby(sorted(a)) if sum(1 for i in g) > 1)
'1545'
感谢大家为解决这个问题所做的努力。我也从不同的答案中学到了很多。以下是我的回答:
a=['1545','1254','1545']
d=[]
duplicates=False
for i in a:
if i not in d:
d.append(i)
if len(d)<len(a):
duplicates=True
else:
duplicates=False
print(duplicates)
不使用集合...
original = ['1545','1254','1545']
# Non-duplicated elements
>>> [x for i, x in enumerate(original) if i == original.index(x)]
['1545', '1254']
# Duplicated elements
>>> [x for i, x in enumerate(original) if i != original.index(x)]
['1545']