在Python中深度合并字典的字典

215
我需要合并多个字典,这是我目前的例子:
dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}

dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}

A B CD 是树的叶子节点,就像 {"info1":"value", "info2":"value2"} 这样。

有一个未知层级(深度)的字典,可能是 {2:{"c":{"z":{"y":{C}}}}}

在我的情况下,它代表一个目录/文件结构,其中节点是文档,叶子是文件。

我想要合并它们以获得:

 dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

我不确定如何用Python轻松地做到这一点。

请查看我的NestedDict类:http://stackoverflow.com/a/16296144/2334951 它可以管理嵌套字典结构,如合并等操作。 - SzieberthAdam
3
提醒所有寻找解决方案的人:本问题仅涉及嵌套字典。大多数答案无法正确处理结构中包含字典列表的更复杂情况。如果您需要此功能,请尝试@Osiloke的答案:https://dev59.com/IWw05IYBdhLWcg3wfx80#25270947 - SHernandez
参见:python dpath merge - dreftymac
@andrew cooke的解决方案的一个陷阱是,即使存在冲突错误,更改也会影响第一个字典。为了避免这个陷阱,可以使用@andrew cooke的源代码创建一个递归辅助函数,并添加一个参数,该参数具有第一个字典的克隆。该参数将被更改并返回,而不是第一个字典。请参见:https://dev59.com/IWw05IYBdhLWcg3wfx80#71700270 - diogo
可以使用Addict来合并字典:d = Dict({1:{"a":{'A'}}, 2:{"b":{'B'}}}); d.update({2:{"c":{'C'}}, 3:{"d":{'D'}}}); d => {1: {'a': {'A'}}, 2: {'b': {'B'}, 'c': {'C'}}, 3: {'d': {'D'}}} - bartolo-otrit
显示剩余2条评论
36个回答

1

还有另一个答案怎么样?这个答案也避免了变异/副作用:

def merge(dict1, dict2):
    output = {}

    # adds keys from `dict1` if they do not exist in `dict2` and vice-versa
    intersection = {**dict2, **dict1}

    for k_intersect, v_intersect in intersection.items():
        if k_intersect not in dict1:
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = v_dict2

        elif k_intersect not in dict2:
            output[k_intersect] = v_intersect

        elif isinstance(v_intersect, dict):
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = merge(v_intersect, v_dict2)

        else:
            output[k_intersect] = v_intersect

    return output


dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

assert dict3 == merge(dict1, dict2)

1

当然,代码将取决于您解决合并冲突的规则。这是一个版本,它可以接受任意数量的参数,并递归地将它们合并到任意深度,而不使用任何对象变异。它使用以下规则来解决合并冲突:

  • 字典优先于非字典值({"foo": {...}} 优先于 {"foo": "bar"}
  • 后面的参数优先于前面的参数(如果按顺序合并 {"a": 1}{"a", 2}{"a": 3},结果将是 {"a": 3}
try:
    from collections import Mapping
except ImportError:
    Mapping = dict

def merge_dicts(*dicts):                                                            
    """                                                                             
    Return a new dictionary that is the result of merging the arguments together.   
    In case of conflicts, later arguments take precedence over earlier arguments.   
    """                                                                             
    updated = {}                                                                    
    # grab all keys                                                                 
    keys = set()                                                                    
    for d in dicts:                                                                 
        keys = keys.union(set(d))                                                   

    for key in keys:                                                                
        values = [d[key] for d in dicts if key in d]                                
        # which ones are mapping types? (aka dict)                                  
        maps = [value for value in values if isinstance(value, Mapping)]            
        if maps:                                                                    
            # if we have any mapping types, call recursively to merge them          
            updated[key] = merge_dicts(*maps)                                       
        else:                                                                       
            # otherwise, just grab the last value we have, since later arguments    
            # take precedence over earlier arguments                                
            updated[key] = values[-1]                                               
    return updated  

1

这是我制作的一种解决方案,可以无限递归地合并字典。传递给函数的第一个字典是主字典 - 其中的值将覆盖第二个字典中相同键的值。

def merge(dict1: dict, dict2: dict) -> dict:
    merged = dict1

    for key in dict2:
        if type(dict2[key]) == dict:
            merged[key] = merge(dict1[key] if key in dict1 else {}, dict2[key])
        else:
            if key not in dict1.keys():
                merged[key] = dict2[key]

    return merged


1

这是一个基于纯Python3集合的深度更新函数变体。它通过逐级循环遍历来更新嵌套字典,并调用自身以更新每个下一级字典值:

def deep_update(dict_original, dict_update):
    if isinstance(dict_original, dict) and isinstance(dict_update, dict):
        output=dict(dict_original)
        keys_original=set(dict_original.keys())
        keys_update=set(dict_update.keys())
        similar_keys=keys_original.intersection(keys_update)
        similar_dict={key:deep_update(dict_original[key], dict_update[key]) for key in similar_keys}
        new_keys=keys_update.difference(keys_original)
        new_dict={key:dict_update[key] for key in new_keys}
        output.update(similar_dict)
        output.update(new_dict)
        return output
    else:
        return dict_update

一个简单的例子:

x={'a':{'b':{'c':1, 'd':1}}}
y={'a':{'b':{'d':2, 'e':2}}, 'f':2}

print(deep_update(x, y))
>>> {'a': {'b': {'c': 1, 'd': 2, 'e': 2}}, 'f': 2}

1
我喜欢这个解决方案 ;) - Orsiris de Jong

0
class Utils(object):

    """

    >>> a = { 'first' : { 'all_rows' : { 'pass' : 'dog', 'number' : '1' } } }
    >>> b = { 'first' : { 'all_rows' : { 'fail' : 'cat', 'number' : '5' } } }
    >>> Utils.merge_dict(b, a) == { 'first' : { 'all_rows' : { 'pass' : 'dog', 'fail' : 'cat', 'number' : '5' } } }
    True

    >>> main = {'a': {'b': {'test': 'bug'}, 'c': 'C'}}
    >>> suply = {'a': {'b': 2, 'd': 'D', 'c': {'test': 'bug2'}}}
    >>> Utils.merge_dict(main, suply) == {'a': {'b': {'test': 'bug'}, 'c': 'C', 'd': 'D'}}
    True

    """

    @staticmethod
    def merge_dict(main, suply):
        """
        获取融合的字典,以main为主,suply补充,冲突时以main为准
        :return:
        """
        for key, value in suply.items():
            if key in main:
                if isinstance(main[key], dict):
                    if isinstance(value, dict):
                        Utils.merge_dict(main[key], value)
                    else:
                        pass
                else:
                    pass
            else:
                main[key] = value
        return main

if __name__ == '__main__':
    import doctest
    doctest.testmod()

0

嘿,我也遇到了同样的问题,但我想出了一个解决方案,并在这里发布,以防它对其他人也有用。基本上是合并嵌套字典并添加值,对我来说,我需要计算一些概率,所以这个方法非常好用:

#used to copy a nested dict to a nested dict
def deepupdate(target, src):
    for k, v in src.items():
        if k in target:
            for k2, v2 in src[k].items():
                if k2 in target[k]:
                    target[k][k2]+=v2
                else:
                    target[k][k2] = v2
        else:
            target[k] = copy.deepcopy(v)

通过使用上述方法,我们可以合并:
target = {'6,6': {'6,63': 1}, '63,4': {'4,4': 1}, '4,4': {'4,3': 1}, '6,63': {'63,4': 1}}
src = {'5,4': {'4,4': 1}, '5,5': {'5,4': 1}, '4,4': {'4,3': 1}}
这将变为: {'5,5': {'5,4': 1}, '5,4': {'4,4': 1}, '6,6': {'6,63': 1}, '63,4': {'4,4': 1}, '4,4': {'4,3': 2}, '6,63': {'63,4': 1}}
还要注意这里的更改:
target = {'6,6': {'6,63': 1}, '6,63': {'63,4': 1}, '4,4': {'4,3': 1}, '63,4': {'4,4': 1}}

src = {'5,4': {'4,4': 1}, '4,3': {'3,4': 1}, '4,4': {'4,9': 1}, '3,4': {'4,4': 1}, '5,5': {'5,4': 1}}

merge = {'5,4': {'4,4': 1}, '4,3': {'3,4': 1}, '6,63': {'63,4': 1}, '5,5': {'5,4': 1}, '6,6': {'6,63': 1}, '3,4': {'4,4': 1}, '63,4': {'4,4': 1}, '4,4': {'4,3': 1, '4,9': 1}}

别忘了还要添加复制的导入:

import copy

0

返回合并后的字典,而不影响输入字典。

def _merge_dicts(dictA: Dict = {}, dictB: Dict = {}) -> Dict:
    # it suffices to pass as an argument a clone of `dictA`
    return _merge_dicts_aux(dictA, dictB, copy(dictA))


def _merge_dicts_aux(dictA: Dict = {}, dictB: Dict = {}, result: Dict = {}, path: List[str] = None) -> Dict:

    # conflict path, None if none
    if path is None:
        path = []

    for key in dictB:

        # if the key doesn't exist in A, add the B element to A
        if key not in dictA:
            result[key] = dictB[key]

        else:
            # if the key value is a dict, both in A and in B, merge the dicts
            if isinstance(dictA[key], dict) and isinstance(dictB[key], dict):
                _merge_dicts_aux(dictA[key], dictB[key], result[key], path + [str(key)])

            # if the key value is the same in A and in B, ignore
            elif dictA[key] == dictB[key]:
                pass

            # if the key value differs in A and in B, raise error
            else:
                err: str = f"Conflict at {'.'.join(path + [str(key)])}"
                raise Exception(err)

    return result

受到@andrew cooke解决方案的启发


除了向后兼容性之外,这个解决方案只需要很少的样板代码或函数对象就可以工作。注意:需要导入copy和处理路径列表的向后兼容性问题。 我不明白为什么Python在libstd中没有有效的实现。 - Jay-Pi

0
def m(a,b):
    aa = {
        k : dict(a.get(k,{}), **v) for k,v in b.items()
        }
    aap = print(aa)
    return aap

d1 = {1:{"a":"A"}, 2:{"b":"B"}}

d2 = {2:{"c":"C"}, 3:{"d":"D"}}

dict1 = {1:{"a":{1}}, 2:{"b":{2}}}

dict2 = {2:{"c":{222}}, 3:{"d":{3}}}

m(d1,d2)

m(dict1,dict2)

"""
Output :

{2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}}


{2: {'b': {2}, 'c': {222}}, 3: {'d': {3}}}

"""

0
下面的merge函数是对Ali的答案的更专业版本,它避免了多次获取值的浪费。它是原地操作的。
下面的merge_new函数不是原地操作的。它返回一个新的字典。它不依赖于copy.deepcopy
def merge(base: dict, update: dict) -> None:
    """Recursively merge `update` into `base` in-place."""
    for k, update_v in update.items():
        base_v = base.get(k)
        if isinstance(base_v, dict) and isinstance(update_v, dict):
            merge(base_v, update_v)
        else:
            base[k] = update_v

def merge_new(base: dict, update: dict) -> dict:
    """Return the updated result after recursively merging `update` into `base`."""
    result = base.copy()
    for k, update_v in update.items():
        base_v = result.get(k)
        if isinstance(base_v, dict) and isinstance(update_v, dict):
            result[k] = merge_new(base_v, update_v)
        else:
            result[k] = update_v
    return result

测试案例:
test_data_base = {
    'a': 1,
    'b': {'c': 1, 'd': 2},
    'c': {'d': {'e': 0, 'f': 1, 'p': {'q': 4}}},
    'x': 0,
    'y': {'x': 3},
}

test_data_update = {
    'a': 9,
    'b': {'d': 3, 'e': 3},
    'c': {'d': {'e': 1, 'g': 8, 'p': {'r': 5, 's': 6}}, 'h': 7},
    'd': 6,
    'e': {'f': 10, 'g': 10},
}

test_expected_updated_data = {
    'a': 9,
    'b': {'c': 1, 'd': 3, 'e': 3},
    'c': {'d': {'e': 1, 'f': 1, 'p': {'q': 4, 'r': 5, 's': 6}, 'g': 8}, 'h': 7},
    'x': 0,
    'y': {'x': 3},
    'd': 6,
    'e': {'f': 10, 'g': 10},
}

# Test merge_new (not in-place)
import copy
test_data_base_copy = copy.deepcopy(test_data_base)
test_actual_updated_data = merge_new(test_data_base, test_data_update)
assert(test_actual_updated_data == test_expected_updated_data)
assert(test_data_base == test_data_base_copy)

# Test merge in-place
merge(test_data_base, test_data_update)
assert(test_data_base == test_expected_updated_data)

0

我已经测试了你的解决方案,并决定在我的项目中使用它:

def mergedicts(dict1, dict2, conflict, no_conflict):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            yield (k, conflict(dict1[k], dict2[k]))
        elif k in dict1:
            yield (k, no_conflict(dict1[k]))
        else:
            yield (k, no_conflict(dict2[k]))

dict1 = {1:{"a":"A"}, 2:{"b":"B"}}
dict2 = {2:{"c":"C"}, 3:{"d":"D"}}

#this helper function allows for recursion and the use of reduce
def f2(x, y):
    return dict(mergedicts(x, y, f2, lambda x: x))

print dict(mergedicts(dict1, dict2, f2, lambda x: x))
print dict(reduce(f2, [dict1, dict2]))

将函数作为参数传递是扩展jterrace解决方案以表现为所有其他递归解决方案的关键。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接