如果值存在,则通过更新而不是覆盖来合并字典

67

如果我有以下两个字典:

d1 = {'a': 2, 'b': 4}
d2 = {'a': 2, 'b': ''}
为了“合并”它们:
dict(d1.items() + d2.items())

导致

{'a': 2, 'b': ''}

如果我想比较两个字典中每个值,并且只有在 d1 中的值为空 / None / '' 时,才更新到d2,应该怎么做?

当相同的键存在时,我只想保留数字值(来自d1d2),而不是空值。如果两个值都为空,则维护空值没有问题。 如果两个都有值,则d1 -值应该保持不变。

例如:

d1 = {'a': 2, 'b': 8, 'c': ''}
d2 = {'a': 2, 'b': '', 'c': ''}

应该导致

{'a': 2, 'b': 8, 'c': ''}

where 8 is not overwritten by ''.


在这里,8不会被''所覆盖。

你尝试过使用 in 吗? - Ignacio Vazquez-Abrams
请参考(基于PHP)https://dev59.com/3nRA5IYBdhLWcg3w4SDo。 - dreftymac
参见(基于Ruby)https://dev59.com/KHI-5IYBdhLWcg3wMFMf - dreftymac
参见:(itemgetter)https://dev59.com/LWct5IYBdhLWcg3wZcqL#12118794 - dreftymac
9个回答

58

只需要交换顺序:

z = dict(d2.items() + d1.items())

顺便提一下,您可能也对潜在更快的update方法感兴趣。

在Python 3中,您必须首先将视图对象转换为列表:

z = dict(list(d2.items()) + list(d1.items())) 
如果你想针对空字符串进行特殊处理,可以按照以下方式操作:
def mergeDictsOverwriteEmpty(d1, d2):
    res = d2.copy()
    for k,v in d2.items():
        if k not in d1 or d1[k] == '':
            res[k] = v
    return res

我认为,在这种情况下,如果d1具有空的项值,它将覆盖具有数值的d2项值。 - siva
@siva 已更新您的特殊情况。 - phihag
1
我认为应该是res=d1.copy(),否则字典之间没有信息传递。 - Richard
1
Python 3.4.3至少不支持字典项集之间的“+”,但是您可以通过转换为“list”来实现相同的结果:dict(list(d2.items()) + list(d1.items())) - JellicleCat
itertools.chain() 也可能有所帮助。 - Frozen Flame
1
在Python 3中,您只需执行z = {**d2, **d1} - Brian McCutchon

30

如果 d1 的值不是 None''(False),则使用 d1 的键值对来更新 d2

>>> d1 = dict(a=1, b=None, c=2)
>>> d2 = dict(a=None, b=2, c=1)
>>> d2.update({k: v for k, v in d1.items() if v})
>>> d2
{'a': 1, 'c': 2, 'b': 2}

(在Python 2中使用iteritems()而不是items()。)


2
为什么不使用 dr={}; dr.update(d1); dr.update((k,v) for (k,v) in d2.items() if v) 来更改输入的 d2 呢? - Pierre GM
这对我起作用了:d2.update({k:v for k,v in d1.iteritems() if v is not None}) - Mauricio
我认为更合适的变体应该是: d1.update({k: v for k, v in d2.items() if not k in d1}) - roy650

10
d2添加来自d1的键/值对,而不会覆盖d2中已有的任何键/值对:
temp = d2.copy()
d2.update(d1)
d2.update(temp)

9

Python 3.5+ 直接量字典

除非使用已过时的 Python 版本,否则最好使用这种方法。

更符合 Python 风格且更快速的字典拆包方式:

d1 = {'a':1, 'b':1}
d2 = {'a':2, 'c':2}
merged = {**d1, **d2}  # priority from right to left
print(merged)

{'a': 2, 'b': 1, 'c': 2}

dict(list(d2.items()) + list(d1.items())) 相比,它更简单且更快速:

d1 = {i: 1 for i in range(1000000)}
d2 = {i: 2 for i in range(2000000)}

%timeit dict(list(d1.items()) + list(d2.items())) 
402 ms ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit {**d1, **d2}
144 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

以下是PEP448的更多信息:

字典中的键以从右到左的优先级顺序保留,因此 {**{'a': 1}, 'a': 2, **{'a': 3}} 的计算结果为 {'a': 3}。 对展开数量或位置没有限制。

仅合并非零值

为了完成这个任务,我们可以创建一个不包含空值的字典,然后通过以下方式将它们合并在一起:

d1 = {'a':1, 'b':1, 'c': '', 'd': ''}
d2 = {'a':2, 'c':2, 'd': ''}
merged_non_zero = {
    k: (d1.get(k) or d2.get(k))
    for k in set(d1) | set(d2)
}
print(merged_non_zero)

输出:

{'a': 1, 'b': 1, 'c': 2, 'd': ''}
  • a -> 使用d1中第一个匹配的值作为'a',因为在d1和d2中都存在'a'
  • b -> 仅在d1中存在
  • c -> 在d2中的值不为零
  • d -> 在两个字典中对应的值均为空字符串

解释

上述代码将使用字典推导式创建一个字典。

如果d1中有该键并且其值非零(即 bool(val) is True),则使用d1[k]的值,否则使用d2[k]的值。

需要注意的是,我们使用set union(即 set(d1) | set(d2))将两个字典的所有键合并,因为它们可能没有完全相同的键。


1
除非你使用{**d2, **d1},否则这个答案是完全错误的。如果你不想让d2覆盖d1中的值,仍然需要反转字典。 - drhagen
谢谢@drhagen,我已经更新了它,包括更好的答案以及如何合并的建议。 - ShmulikA
请将Python 3.5以上的所有内容删除。 - Konchog

6
这里有一个“原地”解决方案(它会修改d2):
# assumptions: d2 is a temporary dict that can be discarded
# d1 is a dict that must be modified in place
# the modification is adding keys from d2 into d1 that do not exist in d1.

def update_non_existing_inplace(original_dict, to_add):
    to_add.update(original_dict) # to_add now holds the "final result" (O(n))
    original_dict.clear() # erase original_dict in-place (O(1))
    original_dict.update(to_add) # original_dict now holds the "final result" (O(n))
    return

这里有另一种“原地”解决方案,虽然不太优雅但可能更有效率,同时也能保持d2不变。
# assumptions: d2 is can not be modified
# d1 is a dict that must be modified in place
# the modification is adding keys from d2 into d1 that do not exist in d1.

def update_non_existing_inplace(original_dict, to_add):
    for key in to_add.iterkeys():
        if key not in original_dict:
            original_dict[key] = to_add[key]

5

d2.update(d1)可以替代dict(d2.items() + d1.items())

这个技巧可以用于将两个字典合并成一个。使用d2.update(d1)比使用dict(d2.items() + d1.items())更加简洁和高效。

11
“...会改变d2的内容,这可能不是提问者想要的。至少,dict(d1.items()+d2.items())可以保持输入不变。” - Pierre GM

4

如果你有大小和键值都相同的字典,可以使用以下代码:

dict((k,v if k in d2 and d2[k] in [None, ''] else d2[k]) for k,v in d1.iteritems())

很遗憾,我的字典不是一些大小和键,只有相同键的不同值的一些出现。 - siva
@siva:我已经修改了代码,以检查从d1键开始的d2。如果这是您的情况,请使用此代码。 - Artsiom Rudzenka

1
如果您想忽略空格,例如合并操作:
a = {"a": 1, "b": 2, "c": ""}
b = {"a": "", "b": 4, "c": 5}
c = {"a": "aaa", "b": ""}
d = {"a": "", "w": ""}

结果为:{'a': 'aaa', 'b': 4, 'c': 5, 'w': ''}

您可以使用以下这两个函数:

def merge_two_dicts(a, b, path=None):
    "merges b into a"
    if path is None:
        path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge_two_dicts(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass  # same leaf value
            else:
                if a[key] and not b[key]:
                    a[key] = a[key]
                else:
                    a[key] = b[key]
        else:
            a[key] = b[key]
    return a


def merge_multiple_dicts(*a):
    output = a[0]
    if len(a) >= 2:
        for n in range(len(a) - 1):
            output = merge_two_dicts(output, a[n + 1])

    return output


所以你可以直接使用merge_multiple_dicts(a,b,c,d)

0

如果您想在合并的字典中更自由地选择何时覆盖值,我有一个解决方案。也许这是一个冗长的脚本,但理解它的逻辑并不难。

感谢fabiocaccamosenderle分享benedict package列表中嵌套迭代逻辑。这些知识对脚本开发至关重要。

Python要求

pip install python-benedict==0.24.3

Python脚本

Dict类的定义。

from __future__ import annotations

from collections.abc import Mapping
from benedict import benedict
from typing import Iterator
from copy import deepcopy


class Dict:
    def __init__(self, data: dict = None):
        """
        Instantiates a dictionary object with nested keys-based indexing.

        Parameters
        ----------
        data: dict
            Dictionary.

        References
        ----------
        [1] 'Dict' class: https://dev59.com/A2w15IYBdhLWcg3w3fjN#70908985
        [2] 'Benedict' package: https://github.com/fabiocaccamo/python-benedict
        [3] Dictionary nested iteration: https://dev59.com/iGgv5IYBdhLWcg3wPORm#10756615
        """
        self.data = deepcopy(data) if data is not None else {}

    def get(self, keys: [object], **kwargs) -> (object, bool):
        """
        Get dictionary item value based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to get item value based on.

        Returns
        -------
        value, found: (object, bool)
            Item value, and whether the target item was found.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])
        value, found = None, False

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            trace = path + [outer_key]

            # Getting item value from dictionary:
            if trace == keys:
                value, found = outer_value, True
                break

            if trace == keys[:len(trace)] and isinstance(outer_value, Mapping):  # Recursion cutoff.
                value, found = self.get(
                    data=outer_value,
                    keys=keys,
                    path=trace
                )

        return value, found

    def set(self, keys: [object], value: object, **kwargs) -> bool:
        """
        Set dictionary item value based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to set item value based on.
        value: object
            Item value.

        Returns
        -------
        updated: bool
            Whether the target item was updated.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])
        updated = False

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            trace = path + [outer_key]

            # Setting item value on dictionary:
            if trace == keys:
                data[outer_key] = value
                updated = True
                break

            if trace == keys[:len(trace)] and isinstance(outer_value, Mapping):  # Recursion cutoff.
                updated = self.set(
                    data=outer_value,
                    keys=keys,
                    value=value,
                    path=trace
                )

        return updated

    def add(self, keys: [object], value: object, **kwargs) -> bool:
        """
        Add dictionary item value based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to add item based on.
        value: object
            Item value.

        Returns
        -------
        added: bool
            Whether the target item was added.
        """
        data = kwargs.get('data', self.data)
        added = False

        # Adding item on dictionary:
        if keys[0] not in data:
            if len(keys) == 1:
                data[keys[0]] = value
                added = True
            else:
                data[keys[0]] = {}

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            if outer_key == keys[0]:  # Recursion cutoff.
                if len(keys) > 1 and isinstance(outer_value, Mapping):
                    added = self.add(
                        data=outer_value,
                        keys=keys[1:],
                        value=value
                    )

        return added

    def remove(self, keys: [object], **kwargs) -> bool:
        """
        Remove dictionary item based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to remove item based on.

        Returns
        -------
        removed: bool
            Whether the target item was removed.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])
        removed = False

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            trace = path + [outer_key]

            # Removing item from dictionary:
            if trace == keys:
                del data[outer_key]
                removed = True
                break

            if trace == keys[:len(trace)] and isinstance(outer_value, Mapping):  # Recursion cutoff.
                removed = self.remove(
                    data=outer_value,
                    keys=keys,
                    path=trace
                )

        return removed

    def items(self, **kwargs) -> Iterator[object, object]:
        """
        Get dictionary items based on nested keys.

        Returns
        -------
        keys, value: Iterator[object, object]
            List of nested keys and list of values.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])

        for outer_key, outer_value in data.items():
            if isinstance(outer_value, Mapping):
                for inner_key, inner_value in self.items(data=outer_value, path=path + [outer_key]):
                    yield inner_key, inner_value
            else:
                yield path + [outer_key], outer_value

    @staticmethod
    def merge(dict_list: [dict], overwrite: bool = False, concat: bool = False, default_value: object = None) -> dict:
        """
        Merges dictionaries, with value assignment based on order of occurrence. Overwrites values if and only if:
            - The key does not yet exist on merged dictionary;
            - The current value of the key on merged dictionary is the default value.

        Parameters
        ----------
        dict_list: [dict]
            List of dictionaries.
        overwrite: bool
            Overwrites occurrences of values. If false, keep the first occurrence of each value found.
        concat: bool
            Concatenates occurrences of values for the same key.
        default_value: object
            Default value used as a reference to override dictionary attributes.

        Returns
        -------
        md: dict
            Merged dictionary.
        """
        dict_list = [d for d in dict_list if d is not None and isinstance(d, dict)] if dict_list is not None else []
        assert len(dict_list), f"no dictionaries given."

        # Keeping the first occurrence of each value:
        if not overwrite:
            dict_list = [Dict(d) for d in dict_list]

            for i, d in enumerate(dict_list[:-1]):
                for keys, value in d.items():
                    if value != default_value:
                        for j, next_d in enumerate(dict_list[i+1:], start=i+1):
                            next_d.remove(keys=keys)

            dict_list = [d.data for d in dict_list]

        md = benedict()
        md.merge(*dict_list, overwrite=True, concat=concat)

        return md

定义main方法以展示示例。

import json


def main() -> None:
    dict_list = [
        {1: 'a', 2: None, 3: {4: None, 5: {6: None}}},
        {1: None, 2: None, 3: {4: 'c', 5: {6: {7: None}}}},
        {1: None, 2: 'b', 3: {4: None, 5: {6: {7: 'd'}}}},
        {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['e', 'f']}}}}}},
        {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['g', 'h']}}}}}},
    ]

    d = Dict(data=dict_list[-1])

    print("Dictionary operations test:\n")
    print(f"data = {json.dumps(d.data, indent=4)}\n")
    print(f"d = Dict(data=data)")

    keys = [11]
    value = {12: {13: 14}}
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    print(f"d.set(keys={keys}, value={value}) --> {d.set(keys=keys, value=value)}")
    print(f"d.add(keys={keys}, value={value}) --> {d.add(keys=keys, value=value)}")
    keys = [11, 12, 13]
    value = 14
    print(f"d.add(keys={keys}, value={value}) --> {d.add(keys=keys, value=value)}")
    value = 15
    print(f"d.set(keys={keys}, value={value}) --> {d.set(keys=keys, value=value)}")
    keys = [11]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [11, 12]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [11, 12, 13]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [11, 12, 13, 15]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [2]
    print(f"d.remove(keys={keys}) --> {d.remove(keys=keys)}")
    print(f"d.remove(keys={keys}) --> {d.remove(keys=keys)}")
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")

    print("\n-----------------------------\n")
    print("Dictionary values match test:\n")
    print(f"data = {json.dumps(d.data, indent=4)}\n")
    print(f"d = Dict(data=data)")

    for keys, value in d.items():
        real_value, found = d.get(keys=keys)
        status = "found" if found else "not found"
        print(f"d{keys} = {value} == {real_value} ({status}) --> {value == real_value}")

    print("\n-----------------------------\n")
    print("Dictionaries merge test:\n")

    for i, d in enumerate(dict_list, start=1):
        print(f"d{i} = {d}")

    dict_list_ = [f"d{i}" for i, d in enumerate(dict_list, start=1)]
    print(f"dict_list = [{', '.join(dict_list_)}]")

    md = Dict.merge(dict_list=dict_list)
    print("\nmd = Dict.merge(dict_list=dict_list)")
    print("print(md)")
    print(f"{json.dumps(md, indent=4)}")


if __name__ == '__main__':
    main()

输出

Dictionary operations test:

data = {
    "1": null,
    "2": "b",
    "3": {
        "4": null,
        "5": {
            "6": {
                "8": {
                    "9": {
                        "10": [
                            "g",
                            "h"
                        ]
                    }
                }
            }
        }
    }
}

d = Dict(data=data)
d.get(keys=[11]) --> (None, False)
d.set(keys=[11], value={12: {13: 14}}) --> False
d.add(keys=[11], value={12: {13: 14}}) --> True
d.add(keys=[11, 12, 13], value=14) --> False
d.set(keys=[11, 12, 13], value=15) --> True
d.get(keys=[11]) --> ({12: {13: 15}}, True)
d.get(keys=[11, 12]) --> ({13: 15}, True)
d.get(keys=[11, 12, 13]) --> (15, True)
d.get(keys=[11, 12, 13, 15]) --> (None, False)
d.remove(keys=[2]) --> True
d.remove(keys=[2]) --> False
d.get(keys=[2]) --> (None, False)

-----------------------------

Dictionary values match test:

data = {
    "1": null,
    "3": {
        "4": null,
        "5": {
            "6": {
                "8": {
                    "9": {
                        "10": [
                            "g",
                            "h"
                        ]
                    }
                }
            }
        }
    },
    "11": {
        "12": {
            "13": 15
        }
    }
}

d = Dict(data=data)
d[1] = None == None (found) --> True
d[3, 4] = None == None (found) --> True
d[3, 5, 6, 8, 9, 10] = ['g', 'h'] == ['g', 'h'] (found) --> True
d[11, 12, 13] = 15 == 15 (found) --> True

-----------------------------

Dictionaries merge test:

d1 = {1: 'a', 2: None, 3: {4: None, 5: {6: None}}}
d2 = {1: None, 2: None, 3: {4: 'c', 5: {6: {7: None}}}}
d3 = {1: None, 2: 'b', 3: {4: None, 5: {6: {7: 'd'}}}}
d4 = {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['e', 'f']}}}}}}
d5 = {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['g', 'h']}}}}}}
dict_list = [d1, d2, d3, d4, d5]

md = Dict.merge(dict_list=dict_list)
print(md)
{
    "1": "a",
    "2": "b",
    "3": {
        "4": "c",
        "5": {
            "6": {
                "7": "d",
                "8": {
                    "9": {
                        "10": [
                            "e",
                            "f"
                        ]
                    }
                }
            }
        }
    }
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接