如何在Python中从字典中选择深度嵌套的键值对

3

我从一个网站下载了JSON数据,我想从嵌套的JSON中选择特定的键:值。我将JSON转换为Python字典。然后我使用字典推导式来选择嵌套的键:值,然而有太多的嵌套,我相信有比分别展开每个字典更好的方法。我在我的方法中看到了冗余。你能提出更好的方法吗?

{
    "success": true,
    "payload": {
        "tag": {
            "slug": "python",
            "name": "Python",
            "postCount": 10590,
            "virtuals": {
                "isFollowing": false
            }
        },
        "metadata": {
            "followerCount": 18053,
            "postCount": 10590,
            "coverImage": {
                "id": "1*O3-jbieSsxcQFkrTLp-1zw.gif",
                "originalWidth": 550,
                "originalHeight": 300
            }
        }
    }
}    

我的方法:
从datetime模块导入datetime和timedelta类。
import json,re

data=r'data.json'
#reads json and converts to dictionary
def js_r(data):
    with open(data, encoding='Latin-1') as f_in:
        return json.load(f_in)

def find_key(obj, key):
    if isinstance(obj, dict):
        yield from iter_dict(obj, key, [])
    elif isinstance(obj, list):
        yield from iter_list(obj, key, [])

def iter_dict(d, key, indices):
    for k, v in d.items():
        if k == key:
            yield indices + [k], v
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])

def iter_list(seq, key, indices):
    for k, v in enumerate(seq):
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])
if __name__=="__main__":
    my_dict=js_r(data)
    print ( "This is dictionary for python tag",my_dict)
    keys=my_dict.keys()
    print ("This is the dictionary keys",my_dict.keys())
    my_payload=list(find_key(my_dict,'title'))
    print ("These are my payload",my_payload)
    my_post=iter_dict(my_dict,'User','id')
    print(list(my_post))

1
你可能会对我的代码感兴趣:http://stackoverflow.com/q/41777880/4014959 - PM 2Ring
@PM 2Ring 实际上,我的意图是要深入了解嵌套结构的底层,payload_dict和paging_dict并不是最终结果,我想进一步获取用户密钥,这就是为什么我认为这种方式是多余的原因。 - Kaleab Woldemariam
在这种情况下,我的代码应该做你想要的事情,我们可以将这个问题作为重复关闭。 - PM 2Ring
@PM 2Ring 我认为这个 JSON 数据是有效的(因为)它来自一个 API。我编辑了发布的 data.json 以简洁为主。 - Kaleab Woldemariam
@PM 2Ring 抱歉,我可能有些死板。我为了简洁起见删除了那些“title”和“User”键,但它们在原始数据中是存在的。我想了解iter_dict()和iter_list()函数的目的以及它们产生的结果。 - Kaleab Woldemariam
显示剩余13条评论
2个回答

1
这里是如何使用我的find_keys生成器,从帮助理解json(dict)结构的函数获取JSON数据中的'id'值和我随机选择的一些其他键。这段代码从字符串中获取JSON数据而不是从文件中读取。
import json

json_data = '''\
{
    "success": true,
    "payload": {
        "tag": {
            "slug": "python",
            "name": "Python",
            "postCount": 10590,
            "virtuals": {
                "isFollowing": false
            }
        },
        "metadata": {
            "followerCount": 18053,
            "postCount": 10590,
            "coverImage": {
                "id": "1*O3-jbieSsxcQFkrTLp-1zw.gif",
                "originalWidth": 550,
                "originalHeight": 300
            }
        }
    }
}
'''

data = r'data.json'

#def js_r(data):
    #with open(data, encoding='Latin-1') as f_in:
        #return json.load(f_in)

# Read the JSON from the inline json_data string instead of from the data file
def js_r(data):
    return json.loads(json_data)

def find_key(obj, key):
    if isinstance(obj, dict):
        yield from iter_dict(obj, key, [])
    elif isinstance(obj, list):
        yield from iter_list(obj, key, [])

def iter_dict(d, key, indices):
    for k, v in d.items():
        if k == key:
            yield indices + [k], v
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])

def iter_list(seq, key, indices):
    for k, v in enumerate(seq):
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])

if __name__=="__main__":
    # Read the JSON data
    my_dict = js_r(data)
    print("This is the JSON data:")
    print(json.dumps(my_dict, indent=4), "\n")

    # Find the id key
    keypath, val = next(find_key(my_dict, "id"))
    print("This is the id: {!r}".format(val))
    print("These are the keys that lead to the id:", keypath, "\n")

    # Find the name, followerCount, originalWidth, and originalHeight
    print("Here are some more (key, value) pairs")
    keys = ("name", "followerCount", "originalWidth", "originalHeight")
    for k in keys:
        keypath, val = next(find_key(my_dict, k))
        print("{!r}: {!r}".format(k, val))

output

This is the JSON data:
{
    "success": true,
    "payload": {
        "tag": {
            "slug": "python",
            "name": "Python",
            "postCount": 10590,
            "virtuals": {
                "isFollowing": false
            }
        },
        "metadata": {
            "followerCount": 18053,
            "postCount": 10590,
            "coverImage": {
                "id": "1*O3-jbieSsxcQFkrTLp-1zw.gif",
                "originalWidth": 550,
                "originalHeight": 300
            }
        }
    }
} 

This is the id: '1*O3-jbieSsxcQFkrTLp-1zw.gif'
These are the keys that lead to the id: ['payload', 'metadata', 'coverImage', 'id'] 

Here are some more (key, value) pairs
'name': 'Python'
'followerCount': 18053
'originalWidth': 550
'originalHeight': 300

顺便说一句,JSON通常使用UTF编码,而不是Latin-1。默认编码是UTF-8,如果可能的话,您应该使用它。


1
我建议您使用python-benedict,这是一个稳定的Python字典子类,具有完整的keypath支持和许多实用程序方法。
它提供了许多格式的IO支持,包括json
您可以直接从json文件初始化它:
from benedict import benedict

d = benedict.from_json('data.json')

现在你的字典支持键路径:
print(d['payload.metadata.coverImage.id'])

# or use get to avoid a possible KeyError
print(d.get('payload.metadata.coverImage.id'))

安装:pip install python-benedict

这是库的存储库和文档: https://github.com/fabiocaccamo/python-benedict

注意:我是这个项目的作者


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接