将带有嵌套数组的JSON转换为CSV

Question

将带有嵌套数组的JSON转换为CSV

3

这是我的JSON模板：

{
  "field 1": [
    {
      "id": "123456"
    },
    {
      "about": "YESH"
    },
    {
      "can_post": true
    },
    {
      "category": "Community"
    }
  ],
  "field 2": [
    {
      "id": "123456"
    },
    {
      "about": "YESH"
    },
    {
      "can_post": true
    },
    {
      "category": "Community"
    }
  ]
}

我想使用Python将此JSON转换为以下格式的csv：

```python name,age,gender John,25,Male Emily,30,Female ```

我希望你能够协助我完成这个任务。

0 field 1, id, about, can_post, category

1 field 2, id, about, can_post, category

我尝试使用pandas读取json然后转换为csv，但是它无法正常工作。

谢谢

- Guy Shoshan

1

我不明白为什么你要使用数组作为键“field1”。 - Nihal

3个回答

1

这个怎么样，如果您有类似于data的JSON数据。

data = [
   {
    "site": "field1",
    "id": "123456",
    "about": "YESH",
    "can_post": True,
    "category": "Community"
  },
  {
    "site": "field2",
    "id": "123456",
    "about": "YESH",
    "can_post": True,
    "category": "Community"
  }
]
# also use True instead of true

df = pd.DataFrame.from_dict(data)

print(df)
# use df.to_csv('filename.csv') for csv

output:

  about  can_post   category      id    site
0  YESH      True  Community  123456  field1
1  YESH      True  Community  123456  field2

- Nihal

这是我的结果：字段1 字段2 0 {u'id': u'123456'} {u'id': u'123456'} 1 {u'about': u'YESH'} {u'about': u'YESH'} 2 {u'can_post': True} {u'can_post': True} 3 {u'category': u'Community'} {u'category': u'Community'} - Guy Shoshan

1

这里的难点在于你的json初始结构不仅仅是映射列表，而是一个映射，其中的值又是映射列表。

我认为，你需要预处理输入数据，或逐个元素地处理它，以获取可以转换为csv行的列表或映射。这里有一个可能的解决方案：

提取第一个元素的键并使用它们构建DictWriter
为每个元素构建一个映射，并将其存储在DictWriter中

代码可能如下：

import json
import csv

# read the json data
with open("input.json") as fd:
    data = json.load(fd)

# extract the field names (using 'field' for the key):
names = ['field']
for d in next(iter(data.values())):
    names.extend(d.keys())

# open the csv file as a DictWriter using those names
with open("output.csv", "w", newline='') as fd:
    wr = csv.DictWriter(fd, names)
    wr.writeheader()
    for field, vals in data.items():
        d['field'] = field
        for inner in vals:
            for k,v in inner.items():
                d[k] = v
        wr.writerow(d)

使用您的数据，它会给出以下结果：

field,id,about,can_post,category
field 1,123456,YESH,True,Community
field 2,123456,YESH,True,Community

- Serge Ballesta

我得到了以下错误："names.extend(d.key()) AttributeError: 'dict'对象没有'key'属性"。 - Guy Shoshan

@GuyShoshan：不确定是否打错了，但我写的是keys，而不是你在评论中写的key。 - Serge Ballesta

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tanmay jain · Accepted Answer

import csv
import json

json.load( json_data) 将 json_data（json文档（txt/二进制文件））反序列化为python对象。

with open('jsn.txt','r') as json_data:
    json_dict = json.load(json_data)

由于您的字段名称（作为字段名称的键）位于不同的字典中，因此我们必须遍历这些字典并将它们放入列表field_names中。

field_names = [ 'field']
for d in json_dict['field 1']:
    field_names.extend(d.keys())

with open('mycsvfile.csv', 'w') as f:  
    w = csv.DictWriter(f, fieldnames = fieild_names)
    w.writeheader()

    for k1, arr_v in json_dict.items():
        temp = {k2:v for d in arr_v for k2,v in d.items()}
        temp['field'] = k1
        w.writerow(temp)

输出

field,id,about,can_post,category
field 1,123456,YESH,True,Community
field 2,123456,YESH,True,Community

如果您觉得上述字典推导式难以理解。

      k1  : arr_v 
'field 1' = [{ "id": "123456" },...{"category": "Community"}]

            for d in arr_v:                 
                        k2 : v
               d --> { "id": "123456" }