Pandas数据框架及其转换为Json

3

基本上,我正在读取一个pandas数据框并将其转换为JSON。虽然我是编码的初学者,但我知道最好使用apply函数而不是iterrows(我已经尝试使用apply函数,但在理解语法和找到我的解决方案方面遇到了一些困难)!

===============================

从Excel表格中读取的数据

id     label        id_customer     label_customer    part_number   number_customer   product   label_product    key    country  value_product

6     Sao Paulo      CUST-99992         Brazil          982               10          sho1564       shoes       SH-99   Chile         1.5        

6     Sao Paulo      CUST-99992         Brazil          982               10          sn47282       sneakers    SN-71   Germany       43.8 

6     Sao Paulo      CUST-43535         Argentina       435               15          sk84393       skirt       SK-11   Netherlands   87.1  

92    Hong Hong      CUST-88888         China           785               58          ca40349       cap         CA-82   Russia        3.95

代码:

import pandas as pd 
import json

df = pd.read_excel(path)

result = []
for labels, df1 in df.groupby(['id', 'label'],sort=False):
    id_, label = labels
    record = {'id': int(id_), 'label': label, 'Customer': []}
    for inner_labels, df2 in df1.groupby(['id_customer', 'label_customer'],sort=False):
        id_,label = inner_labels
        record['Customer'].append({
            'id': id_,
            'label': label,
            'Number': [{'part': str(p), 'number_customer': str(s)} for p, s in zip(df2['part_number'], df2['number_customer'])]  
            })

    result.append(record)

===============================

我接收到的JSON数据:

[
 {
  "id": 6,
  "label": "Sao Paulo",
  "Customer": [
   {
    "id": "CUST-99992",
    "label": "Brazil",
    "Number": [
     {
      "part": "982",
      "number_customer": "10"
     },
     {
      "part": "982",
      "number_customer": "10"
     }
    ]
   },
   {
    "id": "CUST-43535",
    "label": "Argentina",
    "Number": [
     {
      "part": "435",
      "number_customer": "15"
     }
    ]
   }
  ]
 },
 {
  "id": 92,
  "label": "Hong Kong",
  "Customer": [
   {
    "id": "CUST-88888",
    "label": "China",
    "Number": [
     {
      "part": "785",
      "number_customer": "58"
     }
    ]
   }
  ]
 }
]

===============================

期望的Json格式:

[
 {
  "id": 6,
  "label": "Sao Paulo",
  "Customer": [
   {
    "id": "CUST-99992",
    "label": "Brazil",
    "Number": [
     {
      "part": "982",
      "number_customer": "10",
      "Procucts": [
       {
        "product": "sho1564",
        "label_product": "shoes",
        "Order": [
        {
         "key": "SH-99",
         "country": "Chile",    
         "value_product": "1.5"
        }   
       ]            
     },
     {
        "product": "sn47282",
        "label_product": "sneakers",
        "Order": [
        {
         "key": "SN-71",
         "country": "Germany",  
         "value_product": "43.8"
        }   
       ] 
      }
      ]
     }
    ] 
   },
   {
    "id": "CUST-43535",
    "label": "Argentina",
    "Number": [
     {
      "part": "435",
      "number_customer": "15",
      "Procucts": [
       {
        "product": "sk84393",
        "label_product": "skirt",
        "Order": [
        {
         "key": "SK-11",
         "country": "Netherlands",  
         "value_product": "87.1"
        }   
       ]            
      }
      ]
     }
    ]
   }
  ]
 },
 {
  "id": 92,
  "label": "Hong Kong",
  "Customer": [
   {
    "id": "CUST-88888",
    "label": "China",
    "Number": [
     {
      "part": "785",
      "number_customer": "58",
      "Procucts": [
       {
        "product": "ca40349",
        "label_product": "cap",
        "Order": [
        {
         "key": "CA-82",
         "country": "Russia",   
         "value_product": "3.95"
        }   
       ]            
      }
      ]
     }
    ]
   }
  ]
 }
]

===============================

请注意,idlabel是一组信息,就像id_customerlabel customer是另一组信息,part_numbernumber_customer是另一组,productlabel_product是另一组,keycountryvalue_product是另一组。

我的期望 Json 取决于我数据框中的信息。

请问有人能以任何方式帮助我吗?


1
预期的JSON有点奇怪 - 其中'Number'是一个只有一个对象的列表,这可能有多个对象吗? - morganics
是的@lan.. 'Number' 列表可以有多个对象.. 这完全取决于我正在读取的数据框中的内容.. - Lucas
2个回答

1
import pandas as pd 
import json

df = pd.read_excel(path)

result = []
for labels, df1 in df.groupby(['id', 'label'], sort=False):
    id_, label = labels
    record = {'id': int(id_), 'label': label, 'Customer': []}
    for inner_labels, df2 in df1.groupby(['id_customer', 'label_customer'], sort=False):
        id_, label = inner_labels
        customer = {'id': id_, 'label': label, 'Number': []}
        for inner_labels, df3 in df2.groupby(['part_number', 'number_customer'], sort=False):
            p, s = inner_labels
            number = {'part': str(p), 'number_customer': str(s), 'Products': []}
            for inner_labels, df4 in df3.groupby(['product', 'label_product'], sort=False):
                p, lp = inner_labels
                product = {'product': p, 'label_product': lp, 'Order': []}
                for k, c, v in zip(df4['key'], df4['country'], df4['value_product']):
                    product['Order'].append({'key': k, 'country': c, 'value_product': v})
                number['Products'].append(product)
            customer['Number'].append(number)
        record['Customer'].append(customer)
    result.append(record)

1
希望这对您有用!
from io import StringIO
import pandas as pd
import json

csv = """id,label,id_customer,label_customer,part_number,number_customer,product,label_product,key,country,value_product
6,Sao Paulo,CUST-99992,Brazil,982,10,sho1564,shoes,SH-99,Chile,1.5
6,Sao Paulo,CUST-99992,Brazil,982,10,sn47282,sneakers,SN-71,Germany,43.8
6,Sao Paulo,CUST-43535,Argentina,435,15,sk84393,skirt,SK-11,Netherlands,87.1
92,Hong Hong,CUST-88888,China,785,58,ca40349,cap,CA-82,Russia,3.95"""
csv = StringIO(csv)

df = pd.read_csv(csv)

def split(df, groupby, json_func):
    for x, group in df.groupby(groupby):
        yield json_func(group, *x)

a = list(split(df, ['id', 'label'], lambda grp, id_, label: {"id": id_, "label": label, "Customer": list(
    split(grp, ['id_customer', 'label_customer'], lambda grp_1, id_cust, label_cust: {"id": id_cust, "label": label_cust, "Number": list(
        split(grp_1, ['part_number', 'number_customer'], lambda grp_2, part, num_cust: {"part": part, "number_customer": num_cust, "Products": list(
            split(grp_2, ['product', 'label_product'], lambda grp_3, product, label_product: {"product": product, "label_product": label_product, "Order": list(
                split(grp_3, ['key', 'country', 'value_product'], lambda _, key, country, value_product: {"key": key, "country": country, "value_product": value_product}))}
            ))})      
)}))}))

display(a)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接