如何将字典列表转换为数据框?

3

我有一个字典列表,需要将其转换为数据框。我尝试使用多重索引,但无法将整个数据框转换。

response = [{
"name": "xyz",
"empId": "007",
"details": [{
        "address": [{
            "street": "x street",
            "city": "x city"
        }, {
            "street": "xx street",
            "city": "xx city"
        }],
        "country": "xxz country"
    },
    {
        "address": [{
            "street": "y street",
            "city": "y city"
        }, {
            "street": "yy street",
            "city": "yy city"
        }],
        "country": "yyz country"
    }
]
}]

我使用以下代码将字典的内部列表转换为数据框:

for i in details:    
    Country = i['country']

    street =[]
    city = []
    index = pd.MultiIndex.from_arrays([[Country]*len(i['address']), list(range(1,len(i['address'])+1))], names=['Country', 'SL No'])
    df=pd.DataFrame(columns=["Street","City"],index=index)
    if i['address']:
        for row in i['address']:
            street.append(row['street'])
            city.append(row['city'])

    df["Street"]=street
    df["City"]=city

    frames.append(df)
df_final=pd.concat(frames)

得到的输出结果:

Country     SL No   Street     City
xxz country 1       x street   x city
            2      xx street  xx city
yyz country 1       y street   y city
            2      yy street  yy city

如何将字典列表转换为数据框并保留所有信息?

我想要的最终输出:

Name    EmpId    Country        Street     City
xyz     007      xxz country    x street   x city
                                xx street  xx city
                 yyz country    y street   y city
                                yy street  yy cit

请问您能否添加您所需的最终输出格式? - Hadi Mir
@HadiMir,我已经添加了我需要的最终输出。 - laplace
2个回答

3
使用json_normalizeDataFrame.set_index
df = pd.json_normalize(response,
                       record_path=['details','address'],
                       meta=['name','empId', ['address','country']]
                       )

df = df.set_index(['name','empId','address.country'])
print (df)
                               street     city
name empId address.country                    
xyz  007   xxz country       x street   x city
           xxz country      xx street  xx city
           yyz country       y street   y city
           yyz country      yy street  yy city

对于较旧的pandas版本,请使用:

df = pd.io.json.json_normalize(response,
                               record_path=['details','address'],
                               meta=['name','empId', ['address','country']]
                       )

编辑:

测试了多个数值,运行良好:

response = [{
"name": "xyz",
"empId": "007",
"details": [{
        "address": [{
            "street": "x street",
            "city": "x city"
        }, {
            "street": "xx street",
            "city": "xx city"
        }],
        "country": "xxz country"
    },
    {
        "address": [{
            "street": "y street",
            "city": "y city"
        }, {
            "street": "yy street",
            "city": "yy city"
        }],
        "country": "yyz country"
    }
]
},
            {
"name": "xyz1",
"empId": "0071",
"details": [{
        "address": [{
            "street": "x street1",
            "city": "x city1"
        }, {
            "street": "xx stree1t",
            "city": "xx city1"
        }],
        "country": "xxz country"
    },
    {
        "address": [{
            "street": "y street",
            "city": "y city"
        }, {
            "street": "yy street",
            "city": "yy city"
        }],
        "country": "yyz country"
    }
]
}]

df = pd.json_normalize(response,
                       record_path=['details','address'],
                       meta=['name','empId', ['address','country']]
                       )

df = df.set_index(['name','empId','address.country'])

print (df)
                                street      city
name empId address.country                      
xyz  007   xxz country        x street    x city
           xxz country       xx street   xx city
           yyz country        y street    y city
           yyz country       yy street   yy city
xyz1 0071  xxz country       x street1   x city1
           xxz country      xx stree1t  xx city1
           yyz country        y street    y city
           yyz country       yy street   yy city

是的。这可以达到目的。但如果我有一个相同格式的大型数据集,那么就需要迭代处理。 - laplace
@laplace - 不确定什么是迭代?有很多JSON文件需要迭代我的解决方案吗? - jezrael
@jezrael。我的数据集很大,我必须遍历整个数据集并将整个数据集转换为df。您的解决方案对response[0]完美适用,如果我有response[1....n]怎么办? - laplace
@jezrael 不好意思,它可以工作。有没有办法将“address.country”重命名为其他名称? - laplace
1
@laplace 当然可以,在 df = df.set_index(['name','empId','address.country']) 之前使用 df.rename(columns={'address.country': 'new'}) - jezrael

0
据我所知,由于您的数据包含多级列表,没有简单的方法来完成这个任务。尽管有些复杂,以下方法应该有效。代码将会迭代地将列表进行explode操作,并使用json_normalize将字典转换为列。
df = pd.DataFrame.from_records(response)
df = df.explode('details', ignore_index=True)
df = pd.concat([df, pd.json_normalize(df['details'])], axis=1)
df = df.explode('address', ignore_index=True)
df = pd.concat([df, pd.json_normalize(df['address'])], axis=1)
df = df.drop(columns=['details', 'address'])

结果:

  name empId      country     street     city
0  xyz   007  xxz country   x street   x city
1  xyz   007  xxz country  xx street  xx city
2  xyz   007  yyz country   y street   y city
3  xyz   007  yyz country  yy street  yy city

注意:对于版本低于1.1.0的pandas,`explode`没有 `ignore_index` 参数。取而代之的是,在 `explode` 之后使用 `reset_index(drop=True)`。
此外,在旧版的pandas中,您需要使用 `pd.io.json.json_normalize` 而不是 `pd.json_normalize`。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接