获取两个列表之间的不同项，这些项是唯一的。

Question

获取两个列表之间的不同项，这些项是唯一的。

1230

我在Python中有两个列表：

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

假设每个列表中的元素都是唯一的，我希望创建第三个列表，其中包含第一个列表中不在第二个列表中的项目：

假设每个列表中的元素都是唯一的，我想要创建一个新的列表，其中包含来自第一个列表的但不在第二个列表中的项目：

temp3 = ['Three', 'Four']

有没有不需要循环和检查的快速方式？

- Max Frai

26

元素是否保证唯一？如果你有 temp1 = ['One', 'One', 'One'] 和 temp2 = ['One']，你希望得到 ['One', 'One'] 还是 []？ - Michael Mrozek

1

@michael-mrozek 他们是独一无二的。 - Max Frai

19

你想保留元素的顺序吗？ - Mark Byers

1

这个回答解决了您的问题吗？查找不在列表中的元素 - Gonçalo Peres

33个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- arekolek · Answer 1

我想要一个可以像中的diff那样处理两个列表的东西。由于在搜索"python diff two lists"时这个问题最先出现但不是很具体，所以我将发布我所想出的解决方法。

使用difflib中的SequenceMatcher，您可以像diff一样比较两个列表。其他答案都不会告诉你差异发生的位置，但这个方案可以。一些答案仅给出单向差异。一些重新排序元素。有些不能处理重复项。但是，这个解决方案可以为您提供两个列表之间的真正区别：

a = 'A quick fox jumps the lazy dog'.split()
b = 'A quick brown mouse jumps over the dog'.split()

from difflib import SequenceMatcher

for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
  if tag == 'equal': print('both have', a[i:j])
  if tag in ('delete', 'replace'): print('  1st has', a[i:j])
  if tag in ('insert', 'replace'): print('  2nd has', b[k:l])

这会输出：

both have ['A', 'quick']
  1st has ['fox']
  2nd has ['brown', 'mouse']
both have ['jumps']
  2nd has ['over']
both have ['the']
  1st has ['lazy']
both have ['dog']

当然，如果您的应用程序做出了其他答案所做出的相同假设，那么您将从中获得最大的好处。但是，如果您正在寻找真正的 diff 功能，那么这是唯一的选择。

例如，其他答案都无法处理：

a = [1,2,3,4,5]
b = [5,4,3,2,1]

但是这个可以：

  2nd has [5, 4, 3, 2]
both have [1]
  1st has [2, 3, 4, 5]

- Taylor D. Edmiston · Answer 2

这是针对最简单的情况的计数器答案。

它比上面那个做双向差异的更短，因为它只按照问题要求生成一个列表，该列表列出了第一个列表中有但第二个列表中没有的内容。

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

或者，根据您的可读性喜好，它可以成为一个不错的一行代码：

diff = list((Counter(lst1) - Counter(lst2)).elements())

输出：

['Three', 'Four']

请注意，如果您只需要遍历列表，则可以删除list(...)调用。

由于此解决方案使用计数器，因此与许多基于集合的答案相比，它可以正确处理数量。例如，在此输入上：

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

输出结果为：

['Two', 'Two', 'Three', 'Three', 'Four']

- Mohammed · Answer 3

这可能比Mark的列表推导式还要快：

list(itertools.filterfalse(set(temp2).__contains__, temp1))

- sreemanth pulagam · Answer 4

7

这是arulmr方案的单行版本

def diff(listA, listB):
    return set(listA) - set(listB) | set(listB) -set(listA)

- sreemanth pulagam

这句话毫无意义且非常不清晰。它是 (set(a) - set(b)) | (set(a) - set(b))（两个差集的并集？）还是 set(a) - (set(b) | set(a)) - set(b)（这会将整个集合 a 从中减去，始终导致一个空结果）？鉴于运算符优先级，我可以告诉你它是第一个，但仍然，这里的并集和重复是无用的。 - Victor Schröder

- pylang · Answer 5

以下是关于如何比较两个字符串列表的一些简单、保序的方法。

代码

使用pathlib的一种不寻常的方法：

import pathlib


temp1 = ["One", "Two", "Three", "Four"]
temp2 = ["One", "Two"]

p = pathlib.Path(*temp1)
r = p.relative_to(*temp2)
list(r.parts)
# ['Three', 'Four']

这假设两个列表包含具有相等开头的字符串。有关更多详细信息，请参见文档。请注意，与集合操作相比，它并不特别快。

使用itertools.zip_longest进行简单的实现：

import itertools as it


[x for x, y in it.zip_longest(temp1, temp2) if x != y]
# ['Three', 'Four']

- Jenobi · Answer 6

假设我们有两个列表

list1 = [1, 3, 5, 7, 9]
list2 = [1, 2, 3, 4, 5]

我们可以从上面两个列表中看出，列表2中存在项目1、3和5，而不存在项目7和9。另一方面，列表1中存在项目1、3和5，而不存在项目2和4。

如何返回一个包含项目7、9和2、4的新列表是最佳解决方案？

所有以上的答案都找到了解决方案，现在什么是最优解呢？

def difference(list1, list2):
    new_list = []
    for i in list1:
        if i not in list2:
            new_list.append(i)

    for j in list2:
        if j not in list1:
            new_list.append(j)
    return new_list

对比

def sym_diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))

使用timeit可以查看结果。

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, 
list1, list2")
t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, 
list1, list2")

print('Using two for loops', t1.timeit(number=100000), 'Milliseconds')
print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

返回结果

[7, 9, 2, 4]
Using two for loops 0.11572412995155901 Milliseconds
Using symmetric_difference 0.11285737506113946 Milliseconds

Process finished with exit code 0

- soundcorner · Answer 7

如果差异列表的元素已经排序并且是集合，您可以使用一个简单的方法。

list1=[1,2,3,4,5]
list2=[1,2,3]

print list1[len(list2):]

或使用本地设置方法：

subset=set(list1).difference(list2)

print subset

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print "Naive solution: ", timeit.timeit('temp1[len(temp2):]', init, number = 100000)
print "Native set solution: ", timeit.timeit('set(temp1).difference(temp2)', init, number = 100000)

朴素解决方案：0.0787101593292

本机集合解决方案：0.998837615564

- manhgd · Answer 8

这是另一种解决方案：

def diff(a, b):
    xa = [i for i in set(a) if i not in b]
    xb = [i for i in set(b) if i not in a]
    return xa + xb

- Abercrombie · Answer 9

这是@SuperNova的答案的修改版。

def get_diff(a: list, b: list) -> list:
    return list(set(a) ^ set(b))

- Alex Jacob · Answer 10

我来晚了，但您可以比较上述代码的性能。两个最快的竞争者是：

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

我很抱歉编码水平较为基础。

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])