如何获取两个字典的对称差异

8
我正在寻找一种解决方法,用于在Python中查找两个字典之间的对称差异。
例如,如果我有两个字典A和B,并且我想创建一个第三个字典C,该字典包含所有在A和B中找不到的其他项,或者换句话说,是独特的。
我找不到标准答案,因此我决定提出这个问题并给出自己的答案。如果您认为您有更好的方法,我很乐意看到它。
一些数据:
a = {'a': 1, 'b':2}
b = {'b': 2, 'c':3}

期望输出:

{'a': 1, 'c': 3}

如果键匹配而值不匹配,会发生什么?比如 a = {'a': 1, 'b':21} , b = {'b': 22, 'c':3}。此外,你的值是否应该是可哈希的? - JL Peyret
1
@JLPeyret 不多,因为匹配的键不会出现在对称差中。----- 字典的值不需要是可哈希的。 - Elmex80s
8个回答

8
要获取两个字典之间的对称差,请使用以下强大的函数:
def dict_symmetric_difference(a, b):
    return {k: a[k] if k in a else b[k] for k in  # break here to fit without scrolling
            set(a.keys()).symmetric_difference(b.keys())}

只需要逻辑:
{k: a[k] if k in a else b[k] for k in set(a.keys()).symmetric_difference(b.keys())}

这里是一个更简单的函数版本用于解释:
def dict_symmetric_difference(a, b):
    # first make sets of the dictionary keys
    keys_in_a = set(a.keys())
    keys_in_b = set(b.keys())
    unique_keys = keys_in_a.symmetric_difference(keys_in_b)  # get the unique keys
    c = {}  # start an empty dictionary
    for key in unique_keys:  # iterate over the keys
        if key in a: # if the key is from a dictionary, take the value from there.
            c[key] = a[key]
        else:  # the key is in b dictionary, take the value from there.
            c[key] = b[key]
    return c

a[k] if k in a else b[k]的解释:

这是一个三目运算符,它允许我像这样使用:a if condition else b

通过这个技巧,我可以得到键的值,无论它在哪个字典中。


使用任一函数:

>>> dict_symmetric_difference({'a': 1, 'b':2}, {'b':2, 'c':3})
{'a': 1, 'c': 3}

1
那么使用三元运算符怎么样?a[k] if k in a else b[k] 我认为这应该可以很好地工作。 - Inbar Rose
我建议使用 ^ 来计算两个集合的对称差:set(a.keys()) ^ set(b.keys()) - Elmex80s
1
@InbarRose 我在我的回答中提到使用方法调用而不是运算符形式的一个优点:您可以将任何可迭代对象作为该方法的参数传递,无需显式地将其转换为集合。 - PM 2Ring
1
我已经在社区维基答案中添加了一些timeit代码和输出。 - PM 2Ring
据Martijn Pieters所说,FWIW,这些set方法调用不会将它们的参数转换为集合,它们只是直接迭代项目。 - PM 2Ring
显示剩余2条评论

5

这里有一些代码,可以对各种算法进行timeit速度测试。

测试使用相等大小的字典对。键是短的随机字母字符串,在两个字典之间具有不同比例的共享键。字典是从洗牌列表构建的,因此即使它们包含许多共享键,两个字典的底层哈希表结构也应该非常不同。

共享键的确切数量是随机的,共享键的比例由make_dictsshared参数控制。

这段代码的主体可以在Python 2.6+和Python 3上运行。我已经在这台机器上安装了Python 2.6.6和Python 3.6.0(这是一台单核32位机器,配备有2GB RAM,并运行在一个旧版的Debian派生版Linux系统上)。其中一些字典对称差函数使用了字典推导式,在Python 2.6中不可用,所以我无法在Python 2上测试这些函数。此外,elmex_dsd_py2无法在Python 3上运行,因此我已将其注释掉。我最初打算发布Python 2.6的结果,但我必须缩小输出范围以适应消息大小限制。
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

''' Dictionary symmetric difference

    Speed tests of various implementations

    See https://dev59.com/Np_ha4cB1Zd3GeqP4ciP

    Speed test code by PM 2Ring 2017.03.08
'''

from __future__ import print_function

from itertools import product
from random import random, seed, shuffle
from string import ascii_letters
from timeit import Timer

seed(163)

# The dict symmetric difference functions ------------------------------

def inbar_dsd_long(a, b):
    # first make sets of the dictionary keys
    keys_in_a = set(a.keys())
    keys_in_b = set(b.keys())
    # get the unique keys
    unique_keys = keys_in_a.symmetric_difference(keys_in_b)
    # start an empty dictionary
    c = {}
    # iterate over the keys
    for key in unique_keys:
        if key in a: 
            # if the key is from a dictionary, take the value from there.
           c[key] = a[key]
        else:
            # the key is in b dictionary, take the value from there.
            c[key] = b[key]
    return c

def pm2r_dsd_py2(a, b):
    return dict((k, a[k] if k in a else b[k]) for k in set(a.keys()) ^ set(b.keys()))

#def elmex_dsd_py2(a, b):
    #symm_diff = set(a) ^ set(b)
    #return dict((k, v) for k, v in a.items() + b.items() if k in symm_diff) 

def raymond_dsd(a, b):
    c = a.copy()
    c.update(b)
    for k in (a.keys() & b.keys()):
        del c[k]
    return c

def inbar_dsd_short(a, b):
    return {k: a[k] if k in a else b[k] for k in  
        set(a.keys()).symmetric_difference(b.keys())}

def pm2r_dsd_py3(a, b):
    return {k: a[k] if k in a else b[k] for k in a.keys() ^ b.keys()}

def evkounis_dsd(a, b):
    res = {k:v for k, v in a.items() if k not in b}
    res.update({k:v for k, v in b.items() if k not in a})
    return res

def elmex_dsd_py3(a, b):
    symm_diff = set(a) ^ set(b)
    return {k: v for k, v in list(a.items()) + list(b.items()) if k in symm_diff} 

funcs = (
    inbar_dsd_long,
    pm2r_dsd_py2,
    #elmex_dsd_py2,
    raymond_dsd,
    inbar_dsd_short,
    pm2r_dsd_py3,
    evkounis_dsd,
    elmex_dsd_py3,
)

# ----------------------------------------------------------------------

# Random key strings
all_keys = [''.join(t) for t in product(ascii_letters, repeat=3)]
shuffle(all_keys)

def make_dicts(size, shared):
    ''' Make a pair of dicts of length `size`, with random key strings.
        `shared` is a real number 0 <= shared <= 1 giving the approximate 
        ratio of shared keys.
    '''
    a, b = [], []
    keys = iter(all_keys)
    shared_count = 0
    for i in range(size):
        ka = next(keys)
        if random() < shared:
            kb = ka
            shared_count += 1
        else:
            kb = next(keys)
        a.append((ka, i))
        b.append((kb, i))
    shuffle(a)
    shuffle(b)
    return dict(a), dict(b), shared_count

def verify(a, b):
    ''' Verify that all functions return the same result '''
    results = [func(a, b) for func in funcs]
    last = results[-1]
    print(all(last == u for u in results[:-1]))

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import a, b, ' + fname
        cmd = '{0}(a, b)'.format(fname)
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:16} {1}'.format(fname, result))

# ----------------------------------------------------------------------

print('Verifying')
size = 1000
a, b, shared_count = make_dicts(size, 0.1)
print('size: {0}, shared count: {1}'.format(size, shared_count))
verify(a, b)

# Timeit tests
reps = 3
fmt = '\nsize: {0}, shared count: {1}, loops: {2}'
for shared in (0.1, 0.25, 0.5, 0.75, 0.9):
    print('\nSHARED: {0:0.2f}'.format(shared))
    #for size in (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000):
    for size in (10, 100, 1000, 10000):
        a, b, shared_count = make_dicts(size, shared)
        loops = 100000 // size
        print(fmt.format(size, shared_count, loops))
        time_test(loops, reps)    

输出

Verifying
size: 1000, shared count: 100
True

SHARED: 0.10

size: 10, shared count: 1, loops: 10000
raymond_dsd      [0.13777699099955498, 0.13792390800153953, 0.1381044740010111]
evkounis_dsd     [0.23560065399942687, 0.23752641000100994, 0.2455631840020942]
pm2r_dsd_py3     [0.23770248700020602, 0.23880975800057058, 0.24221741200017277]
inbar_dsd_long   [0.25206301800062647, 0.285963577000075, 0.28780356199786183]
inbar_dsd_short  [0.2636144610005431, 0.2653795980004361, 0.2666834120027488]
elmex_dsd_py3    [0.3290278729982674, 0.33175632400161703, 0.3384615989998565]
pm2r_dsd_py2     [0.3978280019982776, 0.43710133700005827, 0.4523775029992976]

size: 100, shared count: 14, loops: 1000
raymond_dsd      [0.09872918600012781, 0.09888040100122453, 0.10413656799937598]
evkounis_dsd     [0.1804931380029302, 0.1811683220003033, 0.18133216399655794]
pm2r_dsd_py3     [0.20522897000046214, 0.20773609400202986, 0.20979003499815008]
inbar_dsd_short  [0.21217649699974572, 0.21281453499977943, 0.21295483400172088]
inbar_dsd_long   [0.22985933599920827, 0.23097444899758557, 0.24446944000010262]
elmex_dsd_py3    [0.24242248500013375, 0.24477665499944123, 0.24785449900082313]
pm2r_dsd_py2     [0.3103436530000181, 0.31146229099977063, 0.3152951789998042]

size: 1000, shared count: 94, loops: 100
raymond_dsd      [0.10726087399962125, 0.10726979699757067, 0.10853421000138042]
evkounis_dsd     [0.19798667299983208, 0.19957152200004202, 0.20145120699817198]
pm2r_dsd_py3     [0.24767412599976524, 0.25033419099781895, 0.25519442899894784]
inbar_dsd_long   [0.25753367499783053, 0.259813735003263, 0.2615334299989627]
inbar_dsd_short  [0.25835196700063534, 0.2647503340012918, 0.26879757099959534]
elmex_dsd_py3    [0.3065065359987784, 0.3129320820007706, 0.3159641370002646]
pm2r_dsd_py2     [0.32748841799912043, 0.34595297499981825, 0.3797209490003297]

size: 10000, shared count: 987, loops: 10
raymond_dsd      [0.2801321059996553, 0.2831085340003483, 0.28407657299976563]
evkounis_dsd     [0.36119127300116816, 0.36392319399965345, 0.36926983400189783]
pm2r_dsd_py3     [0.5073807749977277, 0.5122791090034298, 0.5579565990010451]
inbar_dsd_short  [0.5086212060014077, 0.5168500030013092, 0.5182715480004845]
inbar_dsd_long   [0.602521363998676, 0.6031914080012939, 0.6047401769974385]
pm2r_dsd_py2     [0.6753699099972437, 0.6772755890015105, 0.6782451350009069]
elmex_dsd_py3    [0.7430517110005894, 0.7464511920006771, 0.7468688779990771]

SHARED: 0.25

size: 10, shared count: 3, loops: 10000
raymond_dsd      [0.1376171269985207, 0.13765478899949812, 0.13801490599871613]
pm2r_dsd_py3     [0.20131645299989032, 0.20166713100115885, 0.20322838700303691]
inbar_dsd_long   [0.20759937799812178, 0.2079929980027373, 0.21979623799779802]
evkounis_dsd     [0.2186124869986088, 0.2202955180000572, 0.223359776999132]
inbar_dsd_short  [0.23444793200178538, 0.23780764999901294, 0.23976211099943612]
elmex_dsd_py3    [0.3178573650002363, 0.3193927319989598, 0.32410190099835745]
pm2r_dsd_py2     [0.3520881920012471, 0.3543025139988458, 0.3581208620016696]

size: 100, shared count: 23, loops: 1000
raymond_dsd      [0.10508492400185787, 0.10563860000183922, 0.10888238600091427]
evkounis_dsd     [0.15686738300064462, 0.15824111300025834, 0.15863642399926903]
pm2r_dsd_py3     [0.1829918709991034, 0.184900373002165, 0.18732255400027498]
inbar_dsd_short  [0.18875792199833086, 0.19031438200181583, 0.19102797700179508]
inbar_dsd_long   [0.21139359699736815, 0.22990316799769062, 0.2418856490003236]
elmex_dsd_py3    [0.22641843899691594, 0.2265430750012456, 0.23143781299950206]
pm2r_dsd_py2     [0.2681290770015039, 0.2703527909980039, 0.27255326500016963]

size: 1000, shared count: 263, loops: 100
raymond_dsd      [0.10895683100170572, 0.11233176399764488, 0.11593639900092967]
evkounis_dsd     [0.17859331599902362, 0.17949835600302322, 0.18466946999978973]
pm2r_dsd_py3     [0.2147589500018512, 0.21515577800164465, 0.21701817199937068]
inbar_dsd_long   [0.21823484400010784, 0.2254721450008219, 0.22556141600216506]
inbar_dsd_short  [0.22114897099891095, 0.22157548800169025, 0.22668778500155895]
pm2r_dsd_py2     [0.2780861230021401, 0.27864550599770155, 0.28336624599978677]
elmex_dsd_py3    [0.28186336900034803, 0.2837228559983487, 0.29606761199829634]

size: 10000, shared count: 2480, loops: 10
raymond_dsd      [0.278912030000356, 0.28916871899855323, 0.2898256120024598]
evkounis_dsd     [0.33290919899809523, 0.3355702890003158, 0.3366183610014559]
pm2r_dsd_py3     [0.4445611189985357, 0.45341551800083835, 0.4544847100005427]
inbar_dsd_short  [0.4466933030016662, 0.4632708070021181, 0.48025122500257567]
inbar_dsd_long   [0.5405201060020772, 0.5567013979998592, 0.5911358039993502]
pm2r_dsd_py2     [0.586115582002094, 0.600204237998696, 0.6029243630000565]
elmex_dsd_py3    [0.7058123890019488, 0.7067292030005774, 0.7115862030004791]

SHARED: 0.50

size: 10, shared count: 6, loops: 10000
raymond_dsd      [0.15135921700129984, 0.1533788429987908, 0.17841531700105406]
pm2r_dsd_py3     [0.15311526600271463, 0.15356177799912984, 0.15895434199774172]
inbar_dsd_long   [0.16137141400031396, 0.1618921000008413, 0.17238240400183713]
inbar_dsd_short  [0.1808154470018053, 0.18266997299724608, 0.1863039679992653]
evkounis_dsd     [0.18221631199776311, 0.18251911100014695, 0.18520446800175705]
pm2r_dsd_py2     [0.2700158850020671, 0.2743520539988822, 0.28932957600045484]
elmex_dsd_py3    [0.28983224500188953, 0.2912340100010624, 0.2933940119983163]

size: 100, shared count: 51, loops: 1000
raymond_dsd      [0.10294843999872683, 0.10327848499946413, 0.10685922099946765]
evkounis_dsd     [0.13586801600104081, 0.13726477299860562, 0.142784658997698]
pm2r_dsd_py3     [0.1435330319982313, 0.14396326799760573, 0.14474550500017358]
inbar_dsd_short  [0.15043617100309348, 0.15080328300246038, 0.1527250040016952]
inbar_dsd_long   [0.1667091649978829, 0.17330403699816088, 0.17601154400108499]
pm2r_dsd_py2     [0.20728979400155367, 0.20776088099955814, 0.2079896369978087]
elmex_dsd_py3    [0.21078268400015077, 0.2123827169998549, 0.21517163300086395]

size: 1000, shared count: 491, loops: 100
raymond_dsd      [0.11212847299975692, 0.11414236799828359, 0.11498476199994911]
evkounis_dsd     [0.14059560900204815, 0.14112727400060976, 0.150327464001748]
pm2r_dsd_py3     [0.14733014900048147, 0.15143406900097034, 0.1542897660001472]
inbar_dsd_short  [0.15075810700000147, 0.151888833999692, 0.15750856500017107]
inbar_dsd_long   [0.16265833400029805, 0.16367860500031384, 0.17333104299905244]
pm2r_dsd_py2     [0.1993612549995305, 0.19947306600079173, 0.20446195700060343]
elmex_dsd_py3    [0.24682135100010782, 0.24862800600021728, 0.25419495800088043]

size: 10000, shared count: 4938, loops: 10
evkounis_dsd     [0.2519790539990936, 0.2573451700009173, 0.2603536310016352]
raymond_dsd      [0.2875208960031159, 0.2887761790007062, 0.30461744100102806]
pm2r_dsd_py3     [0.3364586130010139, 0.342166794998775, 0.3465069459998631]
inbar_dsd_short  [0.3490315640010522, 0.6202766900023562, 0.7155317880024086]
inbar_dsd_long   [0.42809327600116376, 0.4363977649991284, 0.4812496539998392]
pm2r_dsd_py2     [0.46369219400003203, 0.46809901899905526, 0.4706174610000744]
elmex_dsd_py3    [0.6603999830003886, 0.6629649060014344, 0.6652154759976838]

SHARED: 0.75

size: 10, shared count: 7, loops: 10000
pm2r_dsd_py3     [0.14004066000052262, 0.14024711000092793, 0.1411744200013345]
inbar_dsd_long   [0.1457400300023437, 0.1463650259975111, 0.17371471199658117]
raymond_dsd      [0.1495657380000921, 0.15151091000007, 0.1532108950013935]
inbar_dsd_short  [0.16798981899773935, 0.1684792589985591, 0.17371860500134062]
evkounis_dsd     [0.18283682300170767, 0.18351536599948304, 0.18536045300061232]
pm2r_dsd_py2     [0.24651207700298983, 0.24725952299922938, 0.3011513509991346]
elmex_dsd_py3    [0.27965197500088834, 0.2817374969999946, 0.28211258000010275]

size: 100, shared count: 83, loops: 1000
evkounis_dsd     [0.10071835599956103, 0.10109729699979653, 0.1036734150002303]
inbar_dsd_long   [0.10147314599817037, 0.1017698140021821, 0.11575333300061175]
pm2r_dsd_py2     [0.1257392070001515, 0.14690794800117146, 0.2597000979985751]
pm2r_dsd_py3     [0.16547765900031663, 0.17877282599874889, 0.1817621379996126]
elmex_dsd_py3    [0.18176361400037422, 0.18339519599976484, 0.18422297999859438]
inbar_dsd_short  [0.18878075899920077, 0.1932126639985654, 0.201184026998817]
raymond_dsd      [0.23026226100046188, 0.2342098570006783, 0.24134657600006904]

size: 1000, shared count: 751, loops: 100
inbar_dsd_short  [0.0925550639985886, 0.09375216300031752, 0.09518678500171518]
pm2r_dsd_py3     [0.09365715600142721, 0.0952552939997986, 0.0984138530002383]
raymond_dsd      [0.10659463599949959, 0.10675223399812239, 0.1076178000002983]
inbar_dsd_long   [0.10787330499806558, 0.10813268299898482, 0.1191909779990965]
evkounis_dsd     [0.11020168100003502, 0.11101243599841837, 0.11369209199983743]
pm2r_dsd_py2     [0.1283391249999113, 0.12977415000204928, 0.13450328500039177]
elmex_dsd_py3    [0.20605224600149086, 0.20856778099914663, 0.21231961700323154]

size: 10000, shared count: 7525, loops: 10
evkounis_dsd     [0.19238157699874137, 0.19369199399807258, 0.20787687100164476]
pm2r_dsd_py3     [0.237352975000249, 0.2393961540001328, 0.24592895499881706]
inbar_dsd_short  [0.24010049900243757, 0.24383026600116864, 0.246290401002625]
inbar_dsd_long   [0.31666912799846614, 0.3353785740000603, 0.3762496050003392]
raymond_dsd      [0.3268343650015595, 0.3270019219999085, 0.32956799900057376]
pm2r_dsd_py2     [0.3330148269997153, 0.34052117800092674, 0.3426254549995065]
elmex_dsd_py3    [0.6130798710000818, 0.6139247349965444, 0.6146237579996523]

SHARED: 0.90

size: 10, shared count: 10, loops: 10000
pm2r_dsd_py3     [0.09191049900255166, 0.09203974899719469, 0.09560386399971321]
inbar_dsd_long   [0.09304381299807574, 0.09397280899793259, 0.10319281500051147]
inbar_dsd_short  [0.0980829280015314, 0.09835117700276896, 0.0987546550022671]
raymond_dsd      [0.14094099900103174, 0.14119526200011023, 0.14634641500015277]
evkounis_dsd     [0.14480078699853038, 0.1466599049999786, 0.14705315900209825]
pm2r_dsd_py2     [0.16137886599972262, 0.16186897499937913, 0.1626489610025601]
elmex_dsd_py3    [0.24912584599951515, 0.2519607159993029, 0.2550744569998642]

size: 100, shared count: 88, loops: 1000
pm2r_dsd_py3     [0.08017906299937749, 0.08175948099960806, 0.08336899599817116]
inbar_dsd_short  [0.08394136000060826, 0.08467326000027242, 0.08476182100275764]
inbar_dsd_long   [0.09241838099842425, 0.0929719669984479, 0.10157853300188435]
evkounis_dsd     [0.09769711500121048, 0.09770239999852492, 0.10219176600003266]
pm2r_dsd_py2     [0.11295593600152642, 0.11317849099941668, 0.11382339899864746]
raymond_dsd      [0.11950065099881613, 0.11954410699763685, 0.16439275900120265]
elmex_dsd_py3    [0.17893833099878975, 0.18027151500064065, 0.18072834000122384]

size: 1000, shared count: 896, loops: 100
pm2r_dsd_py3     [0.06560493199867778, 0.06627220900190878, 0.06649829500020132]
inbar_dsd_short  [0.067232484001579, 0.06832705600027111, 0.06892605100074434]
inbar_dsd_long   [0.07928322799853049, 0.0793153419981536, 0.0874185499997111]
pm2r_dsd_py2     [0.08986150900091161, 0.09258468600091874, 0.09545781900305883]
evkounis_dsd     [0.09216968399778125, 0.09272978199805948, 0.09716289000061806]
raymond_dsd      [0.11052805100189289, 0.11131704600120429, 0.11136766299750889]
elmex_dsd_py3    [0.18965840600139927, 0.1898866600022302, 0.19107911399987643]

size: 10000, shared count: 9011, loops: 10
evkounis_dsd     [0.1584843410018948, 0.16192917299849796, 0.16836377900108346]
pm2r_dsd_py3     [0.1789340169998468, 0.17990425000243704, 0.1874260629992932]
inbar_dsd_short  [0.18104806900009862, 0.18631987900153035, 0.18891330599944922]
inbar_dsd_long   [0.2561770180000167, 0.2672927259991411, 0.27309057399907033]
pm2r_dsd_py2     [0.26508888299940736, 0.2661178109992761, 0.2812051930013695]
raymond_dsd      [0.3262405569985276, 0.32729987999846344, 0.3313657439975941]
elmex_dsd_py3    [0.5737760600022739, 0.5791283889993792, 0.5847248999998556]

我会接受这个答案,因为它不仅包括了在此讨论的所有方法,还包括了时间安排,这样每个用户都可以选择最适合他们预期情况的方法。 - Inbar Rose
@InbarRose 谢谢,我很感激,虽然我不会因为这个回答获得任何积分,因为它是社区维基。 - PM 2Ring
“各种算法”是什么?是其他答案中提供的那些吗?能否给我一个简短的总结,哪个是首选? - TT--
@TT-- 是的,该程序测试此页面上的所有算法,并通过提交它们的人的名称进行识别。抱歉,没有TL;DR。哪种算法最快取决于数据量和它们共享的项目比例。这还取决于您是否在运行Python 2或Python 3。如果您想知道哪个是最快的,我建议在系统上运行此代码,并调整“size”和“shared”范围以匹配典型数据的范围。如果速度不是优先考虑因素,只需使用您认为最易读的算法即可。 - PM 2Ring

4

对称差集等于并集减去交集:

>>> a = {'a': 1, 'b':2}
>>> b = {'b': 2, 'c':3}
>>> c = a.copy()
>>> c.update(b)
>>> for k in (a.keys() & b.keys()):
        del c[k]

>>> c
{'a': 1, 'c': 3}

Python3.5+中,c= {**a, **b}怎么样? - Chris_Rands
1
@Chris_Rands 是的,那个方法可行。当然 dict(**a, **b) 不行,因为它会在同一个键有多个值时引发 TypeError 错误。 - PM 2Ring
@PM2Ring 谢谢,是的,这个方法与您的答案相比如何?我天真地更喜欢您的答案,但我感觉因为 Raymond 建议了这个方法,它一定是最好的方法! - Chris_Rands
1
@Chris_Rands 我怀疑 Raymond 可能对高效的字典操作有所了解。:) 如果能够看到不同算法在此页面上的 timeit 结果,以及各种大小输入和重复键比例的结果将会很有趣。当然,还要看看 Python 3 和 2 之间的速度差异。 - PM 2Ring
就代码而言,当共享密钥的比例小于等于0.5时,你的代码是明显的赢家;有关详细信息,请参阅我的'timeit'代码社区wiki答案。 - PM 2Ring

4

dict.keys() 查看对象 是一种类似于集合的数据类型,它支持 ^ 对称差分操作。

根据文档:

由于键视图中的条目是唯一且可哈希的,因此键视图类似于集合。 [...] 对于类似于集合的视图,所有定义为抽象基类 collections.abc.Set 的操作都可用(例如,==、<或 ^)。

为了解决使用 Inbar Rose 的原始解决方案中出现的假值问题,我们可以使用一个 in 测试来处理;如果键不在 a 中,则必须在 b 中,因此我们只需要进行 1 次 in 测试。

def dict_symmetric_difference(a, b):
    return {k: a[k] if k in a else b[k] for k in a.keys() ^ b.keys()}

a = {'a': 1, 'b':2, 'd': ''}
b = {'b': 2, 'c':3}
print(dict_symmetric_difference(a, b))   

输出

{'d': '', 'c': 3, 'a': 1}

Python 2没有字典视图对象,因此在Python 2中,您需要使用set().keys()调用包装起来。在Python 2.7之前的版本中不支持字典推导式,但是您可以将生成器表达式传递给dict()构造函数,除非您正在运行一个非常古老的Python版本。
以下是可在Python 2.4+上正确运行的版本:
def dict_symmetric_difference(a, b):
    return dict((k, a[k] if k in a else b[k]) for k in set(a.keys()) ^ set(b.keys()))

我们可以使用symmetric_difference方法而不是^运算符来避免进行两次设置调用,因为各种集合操作的非运算符版本将接受任何可迭代对象作为参数。所以我们可以这样做:
set(a.keys()).symmetric_difference(b.keys())

替代

set(a.keys()) ^ set(b.keys())

正如Martijn Pieters在评论中指出的那样,字典视图对象已经被回溯到Python 2.7。语法与Python 3略有不同,以避免破坏使用.keys.values.items方法的代码。要获取键视图对象,请使用.viewkeys方法。mydict.viewkeys()set(mydict.keys())更有效率。字典视图对象还具有动态性,即它们反映对字典所做的任何更改,而set(mydict.keys())必须再次调用,如果对mydict进行了任何更改。虽然这段代码没有问题,但在需要时,这是一个很棒的功能。


@Elmex80s 啊,我没有注意到问题上的python-2.7标签。给我一分钟... - PM 2Ring
我的解决方案不再使用or方法。 - Inbar Rose
@InbarRose 确实如此,这就是我为什么给它点赞的原因。但是我忘记修改自己的答案了。谢谢你提醒我。 - PM 2Ring
1
不要使用 set()。在 Python 2 中使用 a.viewkeys() - Martijn Pieters
换句话说,你所说的Python 2没有字典视图是可以轻易被证明为错误的(参见https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects)。 - Martijn Pieters
@MartijnPieters :抱歉:oops:我忘记了它们,因为它们没有被移植到Python 2.6。 - PM 2Ring

1

在Python 3.5+中,字典解包和Python 3.9+中的字典联合运算符'|'也非常高效。此外,就像上面提到的Raymond和Evkounis例程一样,它们还保留了字典项的顺序。

def dsimdiff_du(a, b):
    return {**{k: a[k] for k in a if k not in b}, **{k: b[k] for k in b if k not in a}}


def dsimdiff_uo(a, b):
    return {k: a[k] for k in a if k not in b} | {k: b[k] for k in b if k not in a}

使用pm2r的测试程序(适用于Windows和Linux),在我的测试中,当共享密钥数量较低(10%和25%)时,速度最快的是这个:
def dsimdiff_uo_raymond(a, b):
    c = a | b
    for k in (a.keys() & b.keys()):
        del c[k]
    return c

1
这是我会做的方式:
A = {'a': 1, 'b': 2}
B = {'b': 2, 'c': 3}


def dict_symmetric_difference(dict_A, dict_B):
    res = {k:v for k, v in dict_A.items() if k not in dict_B}
    res.update({k:v for k, v in dict_B.items() if k not in dict_A})
    return res

print(dict_symmetric_difference(A, B))  # {'a': 1, 'c': 3}

2
你的代码在各种字典大小和共享键数量上都非常高效。请查看我的timeit代码输出以获取详细信息。 - PM 2Ring
@PM2Ring 非常感谢您的计时! - Ma0

0

如果你正在使用Python 3,这是比其他解决方案更紧凑的解决方案:

{k: v for k, v in a.items() ^ b.items()}

请注意,如果这些字典中包含不可哈希类型,则无法工作。

0

非常简短

A = {'a': 1, 'b': 2}
B = {'b': 2, 'c': 3}

print dict((k, v) for k, v in A.items() + B.items() if k in set(A) ^ set(B))

如果您对速度感到不舒服,并怀疑Python在每次迭代中执行set(A) ^ set(B),那么您可以使用以下代码:

symm_diff = set(A) ^ set(B)

print dict((k, v) for k, v in A.items() + B.items() if k in symm_diff) 

2
使用 or 存在问题。请参考 OP 回答中 @Rawing 的评论。 - Ma0
1
你不是在每次迭代中计算 if k in set(A) ^ set(B)) 吗? - Inbar Rose
@InbarRose 老实说:我不确定。列表推导式是Python的一个强大特性,所以我可以想象他们为此进行了优化。但是我会更新我的答案并提到它。 - Elmex80s
2
不,if k in set(A) ^ set(B)) 不会被优化掉,Python 编译器不会执行这种优化,因为它无法知道 AB 在循环中是否会改变。 - PM 2Ring

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接