在Python中从列表中删除一些重复项

Question

在Python中从列表中删除一些重复项

7

我想要删除列表中特定数量的重复项，而不是全部删除。例如，我有一个列表[1,2,3,4,4,4,4,4]，我想要删除其中的3个4，这样就可以得到[1,2,3,4,4]了。一种比较朴素的方法可能是：

def remove_n_duplicates(remove_from, what, how_many):
    for j in range(how_many):
        remove_from.remove(what)

有没有一种方法可以在一次遍历列表中删除三个4，但保留其他两个。

- Jacob Bond

@dot.Py：绝对不是那个问题的重复，因为我们只是尝试从列表中删除有限数量的项目，而不是完全消除重复项。 - user2357112

2

你想要移除 n 个重复项吗？或者断言任何给定项最多只有 m 个重复项？ - mgilson

2

另外，删除哪些副本很重要吗？（例如，您可以删除_前面的_4个副本，还是必须是最后的4个副本？） - mgilson

你可以反向迭代列表并pop出找到元素的索引。通过反向迭代，您确保弹出元素不会干扰下一次迭代，因此：for i, el in enumerate(reversed(seq)):if el == what:seq.pop(i)，并且在弹出足够数量的元素后停止。 - Bakuriu

@mgilson 我想要删除 n 个重复项。我可能有 [4,4,4,6,6,6,6,6] 并希望删除一个 4，但让 6 的数量不变。删除哪个重复项并不重要，也不需要保留顺序。 - Jacob Bond

5个回答

0

你可以使用Python的set功能和&运算符来创建一个列表，然后将其展平。结果列表将是[1, 2, 3, 4, 4]。

x = [1,2,3,4,4,4,4,4]
x2 = [val for sublist in [[item]*max(1, x.count(item)-3) for item in set(x) & set(x)] for val in sublist]

作为一个函数，你将会有以下内容。

def remove_n_duplicates(remove_from, what, how_many):
    return [val for sublist in [[item]*max(1, remove_from.count(item)-how_many) if item == what else [item]*remove_from.count(item) for item in set(remove_from) & set(remove_from)] for val in sublist]

- David N. Sanchez

0

如果列表已经排序，那么就有一个快速解决方案：

def remove_n_duplicates(remove_from, what, how_many):
    index = 0
    for i in range(len(remove_from)):
        if remove_from[i] == what:
            index = i
            break
    if index + how_many >= len(remove_from):
        #There aren't enough things to remove.
        return
    for i in range(index, how_many):
        if remove_from[i] != what:
            #Again, there aren't enough things to remove
            return
    endIndex = index + how_many
    return remove_from[:index+1] + remove_from[endIndex:]

请注意，此操作将返回新数组，因此您需要执行 arr = removeCount(arr, 4, 3)。

- Checkmate

-1

我可以使用集合不同的方法来解决它。

from collections import Counter
li = [1,2,3,4,4,4,4]
cntLi = Counter(li)
print cntLi.keys()

- Saravanan Subramanian

1

但是这会删除所有重复项，实际上并没有充分利用Counter... - mgilson

可以通过使用相应键的值来实现这一点。cntLi.items()提供了一个元组列表，在其中唯一数字存在于键中，数字的计数存在于值中。通过处理值，您可以决定操作。 - Saravanan Subramanian

没错。这种方法肯定是可行的（而且那也不是一个坏的解决方案），但是就目前而言，你的回答缺少了那个至关重要的步骤。 - mgilson

-1

以下是另一个有时可能有用的技巧。不建议作为推荐配方。

def remove_n_duplicates(remove_from, what, how_many):
    exec('remove_from.remove(what);'*how_many)

- Aguy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mgilson · Accepted Answer

如果你只想从列表中删除前 n 个元素，可以通过生成器轻松实现：

def remove_n_dupes(remove_from, what, how_many):
    count = 0
    for item in remove_from:
        if item == what and count < how_many:
            count += 1
        else:
            yield item

使用方法如下：

lst = [1,2,3,4,4,4,4,4]
print list(remove_n_dupes(lst, 4, 3))  # [1, 2, 3, 4, 4]

如果我们使用一些额外的辅助存储，保留任何项目的指定数量的副本同样很容易：

from collections import Counter
def keep_n_dupes(remove_from, how_many):
    counts = Counter()
    for item in remove_from:
        counts[item] += 1
        if counts[item] <= how_many:
            yield item

使用方法类似：

lst = [1,1,1,1,2,3,4,4,4,4,4]
print list(keep_n_dupes(lst, 2))  # [1, 1, 2, 3, 4, 4]

这里的输入是列表和您想要保留的最大项目数。但需要注意的是，这些项目必须是可哈希的...