如何将pandas数据框转换为嵌套字典或JSON?

4
我正在运行Python 3.8和Pandas 0.19.2,并且有一个如下所示的数据框:
id_number name amount addenda
1234 ABCD $100 洗车-$30
1234 ABCD $100 维护-$70
我需要一个如下的字典/JSON:
[
    {
       'id_number': 1234,
       'name': 'ABCD',
       'amount': '$100',
       'addenda': [ 
                  {'payment_related_info': 'Car-wash-$30'}, 
                  {'payment_related_info': 'Maintenance-$70'}
                  ]
    }
]

我尝试使用groupby和to_dict,但是没有起作用。 有什么建议吗?提前感谢您的帮助。


如果“id_number”和“name”相同,那么“amount”也会相同吗? - Derek O
2个回答

2
如果倒退思考,需要在使用.to_dict操作之前将添加信息放在一行中的DataFrame:

身份证号码 姓名 金额 添加信息
1234 ABCD $100 [{payment_related_info: Car-wash-$30, payment_related_info: Maintenance-$70}]

要到达这里,您可以对id_number, name, amount进行groupby,然后应用一个折叠函数,该函数将该groupby中addenda行中的字符串合并为包含每个键为字符串'payment_related_info'的字典列表。

如果还向原始df中添加更多行,则此方法也有效:

身份证号码 姓名 金额 添加信息
1234 ABCD $100 Car-wash-$30
1234 ABCD $100 Maintenance-$70
2345 BCDE $200 Car-wash-$100
2345 BCDE $200 Maintenance-$100
def collapse_row(x):
    addenda_list = x["addenda"].to_list()
    last_row = x.iloc[-1]
    last_row["addenda"] = [{'payment_related_info':v} for v in addenda_list] 
    return last_row

grouped = df.groupby(["id_number","name","amount"]).apply(collapse_row).reset_index(drop=True)
grouped.to_dict(orient='records')

结果:

[
    {
       "id_number":1234,
       "name":"ABCD",
       "amount":"$100",
       "addenda":[
                 {"payment_related_info":"Car-wash-$30"},                              
                 {"payment_related_info":"Maintenance-$70"}
                 ]
    },
    {
      "id_number":2345,
       "name":"BCDE",
       "amount":"$200",
       "addenda":[
                 {"payment_related_info":"Car-wash-$100"}, 
                 {"payment_related_info":"Maintenance-$100"}
                 ]
    }
]

1

只需应用groupby并创建一个数据框来聚合,如下所示:

data = {
    "id_number": [1234, 1234],
    "name": ["ABCD", "ABCD"],
    "amount": ["$100", "$100"],
    "addenda": ["Car-wash-$30", "Maintenance-$70"]
}
df = pd.DataFrame(data=data)

df.groupby(by=["id_number", "name", "amount"]) \
    .agg(lambda col: pd.DataFrame(data=col) \
         .rename(columns={"addenda": "payment_related_info"})) \
    .reset_index() \
    .to_json(orient="records")

这将返回您想要的确切结果!

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接