使用单列中的嵌套字典创建 Pandas 数据框架

Question

使用单列中的嵌套字典创建 Pandas 数据框架

3

我已从此网页下载了一个 .json 文件，并使用以下命令将其转换为字典：

import urllib.request, json

with urllib.request.urlopen("https://www.bcusu.com/svc/voting/stats/election/paramstats/109?groupIds=1,12,7,3,6&sortBy=itemname&sortDirection=ascending") as url:
    data = json.loads(url.read().decode())
    #print(data)

我的最终目标是将我的字典data转换为pandas数据框。主要问题在于data字典是嵌套的，更进一步复杂的是，有一个单独的列(Groups)也是嵌套的。

我找到了这个解决方案，它可以处理以下形式的“统一”嵌套字典：

user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
              'Category 2': {'att_1': 23, 'att_2': 'another'}},
         15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
              'Category 2': {'att_1': 30, 'att_2': 'bar'}}}

“uniformly nested” 指的是上面数据框中的外部和内部键都具有相同数量的键: 12 和 15 都有两个键 Category 1 和 Category 2，最后还有两个键 att 1 和 att 2，而这在我的 data 中并不是这样的。

- BCArg

你期望的输出是什么？ - Espoir Murhabazi

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Espoir Murhabazi · Accepted Answer

当我查看您的数据时，发现问题来自于组，因此我决定将其隔离并单独处理：

我决定为每个组创建一个单独的数据框：

以下是代码：

data_df = {}
for category in data.get('Groups'):
    #print(category)
    data_df[category.get('Name')] = pd.DataFrame.from_records(category.get('Items'))

每个组的输出如下：

data_df['Faculty']
Eligible    IsOtherItem Name    NonVoters   RelativeTurnout Turnout Voters
0   7249    False   Faculty of Business, Law and Social Sciences    5880    4.779694    18.885363   1369
1   6226    False   Faculty of Arts, Design and Media   5187    3.627540    16.688082   1039
2   6156    False   Faculty of Computing, Engineering and the Buil...   5482    2.353188    10.948668   674
3   8943    False   Faculty of Health, Education and Life Sciences  7958    3.439006    11.014201   985
4   71  True    Other   56  0.052371    21.126761   15

年龄范围：

Eligible    IsOtherItem Name    NonVoters   RelativeTurnout Turnout Voters
0   13246   False   18 - 21 10657   9.039173    19.545523   2589
1   6785    False   22 - 25 5939    2.953704    12.468681   846
2   3133    False   26 - 30 2862    0.946163    8.649856    271
3   5392    False   Over 30 5024    1.284826    6.824926    368

以及其他组。

剩余部分仅为信息字典：

del data['Groups']

你可以从它们或另一个数据框中创建一个系列。

如果你知道数据是如何生成的，你可以进行进一步的分析并构建你的数据框。