如何改进去重算法？

Question

如何改进去重算法？

6

我的面试问题是需要返回一个去除重复项但最多可以保留2个重复项的数组的长度。

例如，[1, 1, 1, 2, 2, 3] 的新数组将是 [1, 1, 2, 2, 3]。因此，新长度将为5。我想出了一个 O(2n) 的算法，如何将其改进为最快的算法？

def removeDuplicates(nums):
    if nums is None:
        return 0

    if len(nums) == 0:
        return 0

    if len(nums) == 1:
        return 1

    new_array = {}
    for num in nums:
        new_array[num] = new_array.get(num, 0) + 1

    new_length = 0
    for key in new_array:
        if new_array[key] > 2:
            new_length = new_length + 2
        else:
            new_length = new_length + new_array[key]

    return new_length

new_length = removeDuplicates([1, 1, 1, 2, 2, 3])
assert new_length == 5

我的第一个问题是我的算法是否正确？

- toy

1

@JonathonReinhart 代码审查只接受按预期工作的代码 - 本帖底部的问题可能需要重新措辞，以避免吸引“不清楚你在问什么”或“错误代码”关闭投票。 - Mathieu Guindon

3

O(2n) = O(n)，由于您必须读取整个数组，因此您无法做得更好。如果您正在寻找恒定时间加速，请考虑分析。 - James

1

你可以“最多留下2个重复项”，这意味着你可以没有重复项，对吗？那为什么不用 list(set([1,1,1,2,2,3])) 呢？这也是O(n)，但更简单易懂。 - logee

sum(min(2, count) for count in collections.Counter(nums).values()) - Steven Rumbalski

这是“每个值最多两个实例”、“最多两个例外的唯一值”还是“最多两个值出现两次的唯一值”？输入数组中的值是否被指定为分组？ - greybeard

显示剩余4条评论

5个回答

1

如果您的原始数组大小为n。

Count distinct numbers in your array.

If you have d distinct numbers, then your answer will be

 d        (when n == d)
 d+1      (when n == d+1)
 d+2      (when n >= d+2)

如果数组中所有的数字都小于 n-1，甚至可以在不使用任何额外空间的情况下解决这个问题。如果是这种情况，请查看 this，您可以轻松地计算不同的数字，而无需使用额外的空间。

- Karthik

根据duplicate的解释而定。（如果所有数字都是唯一的，但有一个数字三次出现，那么d+1正确吗？d+2，两者都是或两者都不是？） - greybeard

1

我建议您放弃生成新数组的想法，转而专注于计数：

from collections import Counter

def count_non_2dups(nums):
    new_len = 0
    for num, count in Counter(nums).items():
        new_len += min(2, count)
    return new_len

- gonz

0

int removeDuplicates(vector<int>& nums) {
    if (nums.size() == 0) return nums.size();
    int state = 1;
    int idx = 1;
    for (int i = 1; i < nums.size(); ++i) {
        if (nums[i] != nums[i-1]) {
            state = 1;
            nums[idx++] = nums[i];
        }
        else if (state == 1) {
            state++;
            nums[idx++] = nums[i];
        }
        else {
            state++;
        }
    }
    return idx;
}

思路：维护一个变量（状态），记录当前重复次数（更准确地说，状态记录了当前元素左侧相邻元素的重复次数）。该算法通过一次数组扫描实现O(n)时间复杂度。

- matrixit

0

def removeDuplicates(nums):
    if nums is None:
        return 0

    if len(nums) == 0:
        return 0

    if len(nums) == 1:
        return 1

    new_array_a = set()
    new_array_b = set()
    while nums:
        i = nums.pop()
        if i not in new_array_a:
            new_array_a.add(i)
        elif i not in new_array_b:
            new_array_b.add(i)

    return len(new_array_a) + len(new_array_b)

- Sergey Nosov

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- rickydj · Accepted Answer

您的逻辑是正确的，然而有一种更简单的方法可以达到您在问题中提到的目标。

这是我的逻辑。

myl = [1, 1, 1, 2, 2, 3, 1, 1, 1, 2, 2, 3, 1, 1, 1, 2, 2, 3]

newl = []

for i in myl:
    if newl.count(i) != 2:
        newl.append(i)

print newl
[1, 1, 2, 2, 3, 3]

希望这可以帮助到您。