如何在Python中查找列表中的重复项，而不使用set？

Question

如何在Python中查找列表中的重复项，而不使用set？

3

我知道在Python中，我们可以使用set函数来查找列表中是否有重复项。但是我在想，是否可以在不使用set的情况下查找列表中的重复项。

比如说，我的列表是：

a=['1545','1254','1545']

那么如何找到重复项呢？

- user3260982

1

你想要查找是否存在重复项，获取重复项列表还是找到第一个重复项？ - demented hedgehog

重复的存在已经足够了。 - user3260982

你为什么不能使用集合（set）呢？ - David Ehrmann

我想从迭代的角度来探索这个问题。 - user3260982

8个回答

1

使用 list.count：

In [309]: a=['1545','1254','1545']
     ...: a.count('1545')>1
Out[309]: True

- zhangxaochen

1

使用 list.count：

>>> a = ['1545','1254','1545']
>>> any(a.count(x) > 1 for x in a) # To check whether there's any duplicate
True

>>> # To retrieve any single element that is duplicated
>>> next((x for x in a if a.count(x) > 1), None)
'1545'

# To get duplicate elements (used set literal!)
>>> {x for x in a if a.count(x) > 1}
set(['1545'])

- falsetru

1

对列表进行排序并检查下一个值是否等于上一个值。

a.sort()
last_x = None
for x in a:
    if x == last_x:
       print "duplicate: %s" % x
       break # existence of duplicates is enough

    last_x = x

这应该是O(n log n)，对于大的n来说比计数器解决方案慢（但计数器在底层使用字典..实际上与集合相似）。

另一种选择是插入元素并保持列表排序..参见bisect模块。它会使您的插入变慢，但检查重复项很快。

- demented hedgehog

这个模式可以通过 itertools.groupby 进行简化。 - John La Rooy

这可能是最佳答案的唯一原因是不使用 set 的唯一原因是因为内存使用。这种方法可以避免这个问题。 - David Ehrmann

@DavidEhrmann，不要使用itertools.groupby，它只能对连续的项进行分组。这与集合完全不同。SQL和Ruby的groupby是另一回事。 - John La Rooy

1

>>> lis = []
>>> a=['1545','1254','1545']
>>> for i in a:
...     if i not in lis:
...         lis.append(i)
... 
>>> lis
['1545', '1254']
>>> set(a)
set(['1254', '1545'])

- Tanveer Alam

他不想使用 set :) - thefourtheye

我明白，我刚刚展示了我的答案与set的输出相同。 - Tanveer Alam

谢谢你的想法，很棒。 - user3260982

1

如果这是一道作业题，你的老师可能会要求使用效率极低的 .count() 方法来回答。

实际上，如果 set 被禁用，使用 dict 是你的下一个最佳选择。

>>> a = ['1545','1254','1545']
>>> D = {}
>>> for i in a:
...     if i in D:
...         print "duplicate", i
...         break
...     D[i] = i
... else:
...     print "no duplicate"
... 
duplicate 1545

这里有一个使用groupby的版本，仍然比.count()方法要好得多。

>>> from itertools import groupby
>>> a = ['1545','1254','1545']
>>> next(k for k, g in groupby(sorted(a)) if sum(1 for i in g) > 1)
'1545'

- John La Rooy

0

感谢大家为解决这个问题所做的努力。我也从不同的答案中学到了很多。以下是我的回答：

a=['1545','1254','1545']
d=[]
duplicates=False
for i in a:
    if i not in d:
        d.append(i)
        if len(d)<len(a):
            duplicates=True
        else:
            duplicates=False
print(duplicates)

- user3260982

0

不使用集合...

original = ['1545','1254','1545']

# Non-duplicated elements
>>> [x for i, x in enumerate(original) if i == original.index(x)]
['1545', '1254']

# Duplicated elements
>>> [x for i, x in enumerate(original) if i != original.index(x)]
['1545']

- squillero

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- thefourtheye · Accepted Answer

a=['1545','1254','1545']
from collections import Counter
print [item for item, count in Counter(a).items() if count != 1]

输出

['1545']

这个解决方案的运行时间为O(N)，如果使用的列表元素很多，这将是一个巨大的优势。

如果你只想找出列表中是否有重复项，可以简单地执行以下操作：

a=['1545','1254','1545']
from collections import Counter
print any(count != 1 for count in Counter(a).values())

正如@gnibbler建议的那样，这将是实际上最快的解决方案

from collections import defaultdict
def has_dup(a):
    result = defaultdict(int)
    for item in a:
        result[item] += 1
        if result[item] > 1:
            return True
    else:
        return False

a=['1545','1254','1545']
print has_dup(a)