在Python中搜索嵌套字典中的值

19

搜索一个值并获取其父级字典的名称(键):

Dictionary = {dict1:{
        'part1': {
            '.wbxml': 'application/vnd.wap.wbxml',
            '.rl': 'application/resource-lists+xml',    
        },
        'part2':
            {'.wsdl': 'application/wsdl+xml',
            '.rs': 'application/rls-services+xml',
            '.xop': 'application/xop+xml',
            '.svg': 'image/svg+xml',
            },
        'part3':{...}, ...

   dict2:{
          'part1': {    '.dotx': 'application/vnd.openxmlformats-..'                           
            '.zaz': 'application/vnd.zzazz.deck+xml',
            '.xer': 'application/patch-ops-error+xml',}  
          },
          'part2':{...},
          'part3':{...},...  

    },...
在上述字典中,我需要查找类似于"image/svg+xml"的值。在该字典中没有任何值被重复。如何搜索"image/svg+xml",以便它应该返回一个字典中的父键{ dict1: "part2" }
请注意:解决方案应对Python 2.7和Python 3.3均有效,不需要进行修改。

请不要使用矛盾的标签...选择其中一个。 - A.J. Uppal
@aj8uppal - 这是因为我想要在两个Python版本中都得到答案。如果它在Python 2.7中有效,那么它必须在Python 3.3中有效,反之亦然。因为我们有两个服务器,一个使用Python 2.7,另一个使用最新版本。我们正在编写一个脚本,应该在这两个服务器上运行。对此我很抱歉! - Laxmikant Ratnaparkhi
这样的东西应该在标准库中。 - Jonathan Allen Grant
5个回答

32

以下是一个简单的递归版本:

def getpath(nested_dict, value, prepath=()):
    for k, v in nested_dict.items():
        path = prepath + (k,)
        if v == value: # found value
            return path
        elif hasattr(v, 'items'): # v is a dict
            p = getpath(v, value, path) # recursive call
            if p is not None:
                return p

例子:

print(getpath(dictionary, 'image/svg+xml'))
# -> ('dict1', 'part2', '.svg')

要生成多个路径(仅适用于Python 3):

def find_paths(nested_dict, value, prepath=()):
    for k, v in nested_dict.items():
        path = prepath + (k,)
        if v == value: # found value
            yield path
        elif hasattr(v, 'items'): # v is a dict
            yield from find_paths(v, value, path) 

print(*find_paths(dictionary, 'image/svg+xml'))

这是一个很好、干净的解决方案,但我想知道如何修改它以适应字典中存在多个位置相同值的情况/问题。我们不仅仅是返回结果,而是将其添加到解决路径列表中,对吗? - etnie1031
@etnie1031,你可以尝试用yield替换return,并处理可迭代的返回值(例如使用for循环枚举由find_paths产生的值(getpath对于多个值将是误导性的名称)。 - jfs

10
这是对嵌套字典进行迭代遍历,同时记录通向特定位置的所有键。因此,一旦在字典中找到正确的值,您也已经拥有到达该值所需的键。
如果将以下代码放入.py文件中,则可以直接运行find_mime_type(...)函数,并返回从原始字典到所需值的键序列。 demo()函数展示了如何使用它。
d = {'dict1':
         {'part1':
              {'.wbxml': 'application/vnd.wap.wbxml',
               '.rl': 'application/resource-lists+xml'},
          'part2':
              {'.wsdl': 'application/wsdl+xml',
               '.rs': 'application/rls-services+xml',
               '.xop': 'application/xop+xml',
               '.svg': 'image/svg+xml'}},
     'dict2':
         {'part1':
              {'.dotx': 'application/vnd.openxmlformats-..',
               '.zaz': 'application/vnd.zzazz.deck+xml',
               '.xer': 'application/patch-ops-error+xml'}}}


def demo():
    mime_type = 'image/svg+xml'
    try:
        key_chain = find_mime_type(d, mime_type)
    except KeyError:
        print ('Could not find this mime type: {0}'.format(mime_type))
        exit()
    print ('Found {0} mime type here: {1}'.format(mime_type, key_chain))
    nested = d
    for key in key_chain:
        nested = nested[key]
    print ('Confirmation lookup: {0}'.format(nested))


def find_mime_type(d, mime_type):
    reverse_linked_q = list()
    reverse_linked_q.append((list(), d))
    while reverse_linked_q:
        this_key_chain, this_v = reverse_linked_q.pop()
        # finish search if found the mime type
        if this_v == mime_type:
            return this_key_chain
        # not found. keep searching
        # queue dicts for checking / ignore anything that's not a dict
        try:
            items = this_v.items()
        except AttributeError:
            continue  # this was not a nested dict. ignore it
        for k, v in items:
            reverse_linked_q.append((this_key_chain + [k], v))
    # if we haven't returned by this point, we've exhausted all the contents
    raise KeyError


if __name__ == '__main__':
    demo()

输出:

在这里找到了image/svg+xml的mime类型:['dict1', 'part2', '.svg']

确认查找:image/svg+xml


@kobehjohn:我正在等待您的进一步解释。 - Laxmikant Ratnaparkhi
@kobehjohn:我很忙,无法处理那个任务。稍后,一个小时后会开始处理。测试通过后会接受答案。谢谢。 - Laxmikant Ratnaparkhi
@LaxmikantGurnalkar 听起来不错。希望对你有用。我更新了答案代码,以展示如何在获取密钥链后查找值。 - KobeJohn
1
+1. 这里是使用递归调用实现的相同算法。 (https://dev59.com/U2Eh5IYBdhLWcg3wrU8N#22171182) - jfs
@J.F.Sebastian,我倾向于在可能的情况下使用语言来使事情更加明确,避免让新手/多语言程序员感到困惑。是否有指南建议对于空列表使用文字表达?我一直在搜索但无法找到答案。 - KobeJohn
显示剩余5条评论

3

这里有一个解决方案,适用于嵌套列表和字典的复杂数据结构

import pprint

def search(d, search_pattern, prev_datapoint_path=''):
    output = []
    current_datapoint = d
    current_datapoint_path = prev_datapoint_path
    if type(current_datapoint) is dict:
        for dkey in current_datapoint:
            if search_pattern in str(dkey):
                c = current_datapoint_path
                c+="['"+dkey+"']"
                output.append(c)
            c = current_datapoint_path
            c+="['"+dkey+"']"
            for i in search(current_datapoint[dkey], search_pattern, c):
                output.append(i)
    elif type(current_datapoint) is list:
        for i in range(0, len(current_datapoint)):
            if search_pattern in str(i):
                c = current_datapoint_path
                c += "[" + str(i) + "]"
                output.append(i)
            c = current_datapoint_path
            c+="["+ str(i) +"]"
            for i in search(current_datapoint[i], search_pattern, c):
                output.append(i)
    elif search_pattern in str(current_datapoint):
        c = current_datapoint_path
        output.append(c)
    output = filter(None, output)
    return list(output)


if __name__ == "__main__":
    d = {'dict1':
             {'part1':
                  {'.wbxml': 'application/vnd.wap.wbxml',
                   '.rl': 'application/resource-lists+xml'},
              'part2':
                  {'.wsdl': 'application/wsdl+xml',
                   '.rs': 'application/rls-services+xml',
                   '.xop': 'application/xop+xml',
                   '.svg': 'image/svg+xml'}},
         'dict2':
             {'part1':
                  {'.dotx': 'application/vnd.openxmlformats-..',
                   '.zaz': 'application/vnd.zzazz.deck+xml',
                   '.xer': 'application/patch-ops-error+xml'}}}

    d2 = {
        "items":
            {
                "item":
                    [
                        {
                            "id": "0001",
                            "type": "donut",
                            "name": "Cake",
                            "ppu": 0.55,
                            "batters":
                                {
                                    "batter":
                                        [
                                            {"id": "1001", "type": "Regular"},
                                            {"id": "1002", "type": "Chocolate"},
                                            {"id": "1003", "type": "Blueberry"},
                                            {"id": "1004", "type": "Devil's Food"}
                                        ]
                                },
                            "topping":
                                [
                                    {"id": "5001", "type": "None"},
                                    {"id": "5002", "type": "Glazed"},
                                    {"id": "5005", "type": "Sugar"},
                                    {"id": "5007", "type": "Powdered Sugar"},
                                    {"id": "5006", "type": "Chocolate with Sprinkles"},
                                    {"id": "5003", "type": "Chocolate"},
                                    {"id": "5004", "type": "Maple"}
                                ]
                        },

                        ...

                    ]
            }
    }

pprint.pprint(search(d,'svg+xml','d'))
>> ["d['dict1']['part2']['.svg']"]

pprint.pprint(search(d2,'500','d2'))
>> ["d2['items']['item'][0]['topping'][0]['id']",
 "d2['items']['item'][0]['topping'][1]['id']",
 "d2['items']['item'][0]['topping'][2]['id']",
 "d2['items']['item'][0]['topping'][3]['id']",
 "d2['items']['item'][0]['topping'][4]['id']",
 "d2['items']['item'][0]['topping'][5]['id']",
 "d2['items']['item'][0]['topping'][6]['id']"]

1
你好,欢迎来到SO!请增加更多细节并解释您的答案。 - Devang Padhiyar
我不是一个常规的程序员。对于初学者的编码风格,我深感抱歉。 该解决方案旨在适用于具有嵌套字典和列表的任何复杂字典。逻辑非常简单。我从顶层层次开始,递归地遍历字典和列表到下一级层次。 在每个点上,我检查任何键或其值(对于字典)或任何索引或其值(对于列表)是否与搜索模式匹配。如果有匹配,则将路径推入输出列表中。 - Vinay Kumar

1

遍历嵌套字典以查找特定值。当成功找到该值时,将打印出完整的键路径。为了教育目的,我保留了所有注释和打印语句(这不是生产代码!)

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Jan 24 17:16:46 2022

@author: wellington
"""


class Tree(dict):
    """
    allows autovivification as in Perl hashes
    """

    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

# tracking the key sequence when seeking the target
key_list = Tree()

# dict storing the target success result
success = Tree()


# example nested dict of dicts and lists
E = {
    'AA':
        {
          'BB':
               {'CC':
                     {
                      'DD':
                          {
                           'ZZ':'YY',
                           'WW':'PP'
                           },
                       'QQ':
                           {
                            'RR':'SS'
                            },
                     },
                'II': 
                     {
                      'JJ':'KK'
                     }, 
                'LL':['MM', 'GG', 'TT']
               }
        }
    }


def find_keys_from_value(data, target):
    """
    recursive function -
    given a value it returns all the keys in the path to that value within
    the dict "data"
    there are many paths and many false routes
    at the end of a given path if success has not been achieved
    the function discards keys to get back to the next possible path junction
    """

    print(f"the number of keys in the local dict is {len(data)}")
    key_counter = 0

    for key in data:
        key_counter += 1

        # if target has been located stop iterating through keys
        if success[target] == 1:
            break
        else:
            # eliminate prior key from path that did not lead to success
            if key_counter > 1:
                k_list.pop()
            # add key to new path
            k_list.append(key)
            print(f"printing k_list after append{k_list}")

        # if target located set success[target] = 1 and exit
        if key == target or data[key] == target:
            key_list[target] = k_list
            success[target] = 1
            break
        # if the target has not been located check to see if the value
        # associated with the new key is a dict and if so return to the
        # recursive function with the new dict as "data"
        elif isinstance(data[key], dict):
            print(f"\nvalue is dict\n {data[key]}")
            find_keys_from_value(data[key], target)

        # check to see if the value associated with the new key is a list
        elif isinstance(data[key], list):
            # print("\nv is list\n")
            # search through the list
            for i in data[key]:

                # check to see if the list element is a dict
                # and if so return to the recursive function with
                # the new dict as "data
                if isinstance(i, dict):
                    find_keys_from_value(i, target)

                # check to see if each list element is the target
                elif i == target:
                    print(f"list entry {i} is target")
                    success[target] = 1
                    key_list[target] = k_list
                elif i != target:
                    print(f"list entry {i} is not target")
                    print(f"printing k_list before pop_b {k_list}")
                    print(f"popping off key_b {key}")

        # so if value is not a key and not a list and not the target then
        # discard the key from the key list
        elif data[key] != target:
            print(f"value {data[key]} is not target")
            print(f"printing k_list before removing key_before {k_list}")
            print(f"removing key_c {key}")
            k_list.remove(key)


# select target values
values = ["PP", "SS", "KK", "TT"]
success = {}

for target in values:
    print(f"\nlooking for target {target}")
    success[target] = 0
    k_list = []
    find_keys_from_value(E, target)
    print(f"\nprinting key_list for target {target}")
    print(f"{key_list[target]}\n")
    print("\n****************\n\n")

1

这里有两种类似的快速且简单的方法来执行此类型的操作。函数 find_parent_dict1 使用列表推导式,但如果您对此感到不舒服,则 find_parent_dict2 使用臭名昭著的嵌套循环。

Dictionary = {'dict1':{'part1':{'.wbxml':'1','.rl':'2'},'part2':{'.wbdl':'3','.rs':'4'}},'dict2':{'part3':{'.wbxml':'5','.rl':'6'},'part4':{'.wbdl':'1','.rs':'10'}}}

value = '3'

def find_parent_dict1(Dictionary):
    for key1 in Dictionary.keys():
        item = {key1:key2 for key2 in Dictionary[key1].keys() if value in Dictionary[key1][key2].values()}
        if len(item)>0:
            return item

find_parent_dict1(Dictionary)


def find_parent_dict2(Dictionary):
    for key1 in Dictionary.keys():
        for key2 in Dictionary[key1].keys():
            if value in Dictionary[key1][key2].values():
                print {key1:key2}

find_parent_dict2(Dictionary)

2
它不能用于任意嵌套的字典。在for循环中,.keys()调用是多余的。 - jfs

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接