在Python中:两个列表的差异

5
我有两个列表,如下所示:
found = ['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5']
expected = ['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3']

我希望找出这两个列表之间的不同处。
我已经完成了

list(set(expected)-set(found))

并且。
list(set(found)-set(expected))

这将分别返回['E3']['E5']

然而,我所需的答案是:

'E3' is missing from found.
'E5' is missing from expected.
There are 2 copies of 'E5' in found.
There are 3 copies of 'E2BS' in found.
There are 2 copies of 'E2' in found.

任何帮助和建议都受欢迎!
3个回答

8

collections.Counter类在枚举多重集之间的差异方面表现出色:

>>> from collections import Counter
>>> found = Counter(['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5'])
>>> expected = Counter(['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3'])
>>> list((found - expected).elements())
['E2', 'E2BS', 'E2BS', 'E5', 'E5']
>>> list((expected - found).elements())

您可能还对difflib.Differ感兴趣:

>>> from difflib import Differ
>>> found = ['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5']
>>> expected = ['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3']
>>> for d in Differ().compare(expected, found):
...     print(d)

+ CG
+ E6
  E1
  E2
  E4
+ L2
+ E7
+ E5
+ L1
+ E2BS
+ E2BS
+ E2BS
+ E2
  E1^E4
+ E5
- E6
- E7
- L1
- L2
- CG
- E2BS
- E3

关于 difflib.Differ 的回答非常好。我有时候开始使用它,而不是集合。 - COCO

4

使用Python的setCounter代替自己编写解决方案:

  1. symmetric_difference:查找只在一个集合中出现的元素或另一个集合中出现的元素,但不是两个集合都有的元素。
  2. intersection:查找两个集合中共同的元素。
  3. difference:这本质上就是你通过从一个集合中减去另一个集合所做的操作。

代码示例

  • found.difference(expected) # set(['E5'])
    
  • expected.difference(found) # set(['E3'])
    
  • found.symmetric_difference(expected) # set(['E5', 'E3'])
    
  • Finding copies of objects: this question was already referenced. Using that technique gets you all duplicates, and using the resultant Counter object, you can find how many duplicates. For example:

    collections.Counter(found)['E5'] # 2
    

2

你已经回答了前两个问题:

print('{0} missing from found'.format(list(set(expected) - set(found)))
print('{0} missing from expected'.format(list(set(found) - set(expected)))

接下来两个问题需要你查看如何在列表中计算重复项,这方面有很多在线解决方案(包括这个:在Python中查找和列出重复项?)。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接