在一个整数数组/列表中查找重复项

Question

在一个整数数组/列表中查找重复项

3

给定一个整数数组/列表，输出其中的重复数字。同时，我真正想知道的是：哪种方案具有最佳的时间性能？最佳的空间性能？是否可能同时具有最佳的时间和空间性能？只是好奇。谢谢！例如：给定列表[4,1,7,9,4,5,2,7,6,5,3,6,7]，则答案为[4,7,6,5]（输出顺序无关紧要）。我用Python编写了一个使用哈希和二分搜索的解决方案。下面是我的解决方案之一：

def binarySearch(array, number):
    start = 0
    end = len(array)
    mid = (end + start) // 2
    while (end > start):
        mid = start + (end - start) // 2
        if array[mid] == number:
            return (mid, True)
        elif number > array[mid]:
            if start == mid:
                return (mid + 1, False)
                start = mid
            else:
                end = mid

    return (mid, False)

def findDuplicatesWithHash(array):
    duplicatesHash = {}
    duplicates = []
    for number in array:
        try:
            index,found = binarySearch(duplicates, number)
            if duplicatesHash[number] == 0 and not found: 
                duplicates.insert(index, number)
        except KeyError as error:
            duplicatesHash[number] = 0

    duplicatesSorted = sorted(duplicates, key=lambda tup: tup)
    return duplicatesSorted

- OhaiMac

这个输入数组的期望输出是什么？[1,1,2,3,4,4,5,5,5,6] - wookie919

"能够同时拥有最佳的时间和空间性能吗？" 好问题！可能不行 ;) - user1511956

@wookie919：[1,4,5] - OhaiMac

3个回答

1

找到重复元素与排序非常相似。也就是说，每个元素都需要直接或间接地与所有其他元素进行比较，以查找是否存在重复项。可以修改快速排序算法来输出具有相邻匹配元素的元素，其空间复杂度为O(n)，平均时间复杂度为O(n*log(n))。

- Alden

很好的观点！我想出来的一个解决方案使用了字典和二分查找。不幸的是，那种方法远不如我想出来的另一种方法好，它基本上就是你上面提到的方法。 - OhaiMac

1

获取重复项的一种方法：

l = [4,1,7,9,4,5,2,7,6,5,3,6]
import collections

print([item for item, count in collections.Counter(l).items() if count > 1])

- GAVD

很棒的解决方案！非常简短明了。这比我使用排序想出来的解决方案稍微快一些。不过，这个解决方案的空间复杂度是多少？ - OhaiMac

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- p0lAris · Accepted Answer

寻找重复项有多种解决方案。鉴于这个问题是完全通用的，可以假设在给定包含 n 个值的列表中，重复项的数量在范围 [0, n/2] 内。

你能想到哪些可能的方法？

Hash Table approach:

Store values while traversing the list if value already doesn't exist in the hash table. If the value, exists, you have a duplicate.
```
Algorithm FindDuplicates(list)
hash_table <- HashTable()
duplicates <- List()
for value in list:
    if value in hash_table:
        duplicates.add(value)
    else:
        hash_table.add(value, true)
```
- Time: O(n) to traverse through all values
- Space: O(n) to save all possible values in the hash table.
Sort Array

Sort the array and traverse neighbour values.
```
Algorithm FindDuplicates(list)
list.sort()
duplicates <- Set()
for i <- [1, len(list)-1]:
    if list[i] = list[i-1]:
        duplicates.add(list[i])
```
- Time: O(n.logn) + O(n) = O(n.logn) to sort and traverse all values
- Space: O(1) as no extra space created to produce duplicates

Check for every value

For every value check if the value exists in the array.

Algorithm Search(i, list):
    for j <- [0, len(list)-1] - [i]:
        if list[j] = list[i]:
            return true
    return false

Algorithm FindDuplicates(list)
duplicates <- Set()
for i <- [1, len(list)-1]:
    if Search(i, list):
        duplicates.add(list[i])

Time: O(n^2) number of comparisons are n*n(-1) Space: O(1) as no extra space created to produce duplicates

注意：重复数组的空间不能包含在空间复杂度方程中，因为这是我们想要的结果。

你还能想到其他的吗？