使用Python将CSV转换为JSON并输出数组

Question

使用Python将CSV转换为JSON并输出数组

3

我试图从CSV中获取数据，并将其以JSON格式放入顶层数组。

目前我正在运行以下代码：

import csv
import json

csvfile = open('music.csv', 'r')
jsonfile = open('file.json', 'w')

fieldnames = ("ID","Artist","Song", "Artist")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
    json.dump(row, jsonfile)
    jsonfile.write('\n')

CSV文件的格式如下所示：

| 1 | Empire of the Sun | We Are The People | Walking on a Dream |
| 2 | M83 | Steve McQueen | Hurry Up We're Dreaming |

Where = 列1：ID | 列2：艺术家 | 列3：歌曲 | 列4：专辑

And getting this output:

并获得以下输出：

    {"Song": "Empire of the Sun", "ID": "1", "Artist": "Walking on a   Dream"}
    {"Song": "M83", "ID": "2", "Artist": "Hurry Up We're Dreaming"}

我正在尝试让它看起来像这样：

但我希望它更加美观易懂：

{             
    "Music": [

    {
        "id": 1,
        "Artist": "Empire of the Sun",
        "Name": "We are the People",
        "Album": "Walking on a Dream"
    },
    {
        "id": 2,
        "Artist": "M83",
        "Name": "Steve McQueen",
        "Album": "Hurry Up We're Dreaming"
    },
    ]
}

- orpheus

仅针对问题1，使用以下代码片段进行DictReader设置：`import collections ; reader = DictReader(csvfile, fieldnames, dict_class=collections.OrderedDict)` - Michel Müller

2和3不清楚。如果您希望别人帮助，请指定预期输出。 - Michel Müller

导入语句应该放在开头的导入部分，而 reader = 这一行则可以直接替换你原本使用的 DictReader 初始化。 - Michel Müller

预期输出是我所述的第一件事 - orpheus

当我在顶部放置“import collections;”并添加“reader = DictReader(csv..)”时，它会显示NameError：未定义名称'DictReader'。 - orpheus

跳过该字段的原因是在此元组中您有两次使用了“Artist”：（“ID”，“Artist”，“Song”，“Artist”）。 - chthonicdaemon

4个回答

3

好的，这个没有经过测试，但可以尝试以下方法：

import csv
import json
from collections import OrderedDict

fieldnames = ("ID","Artist","Song", "Artist")

entries = []
#the with statement is better since it handles closing your file properly after usage.
with open('music.csv', 'r') as csvfile:
    #python's standard dict is not guaranteeing any order, 
    #but if you write into an OrderedDict, order of write operations will be kept in output.
    reader = csv.DictReader(csvfile, fieldnames)
    for row in reader:
        entry = OrderedDict()
        for field in fieldnames:
            entry[field] = row[field]
        entries.append(entry)

output = {
    "Music": entries
}

with open('file.json', 'w') as jsonfile:
    json.dump(output, jsonfile)
    jsonfile.write('\n')

- Michel Müller

追踪（Traceback）最近的一次调用：文件“spotPy.py”，第9行，在<module>中 reader = csv.DictReader( csvfile, fieldnames, dict_class=collections.OrderedDict) 文件“/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py”，第79行，在__init__中 self.reader = reader(f, dialect, *args, **kwds) TypeError: 'dict_class'是该函数的无效关键字参数。 - orpheus

抱歉，那是一些非标准的DictReader，请稍等。 - Michel Müller

请注意OP代码中的错误，他们使用了fieldnames = ("ID","Artist","Song", "Artist")。应该是 fieldnames = ("ID","Artist","Song", "Album")。 - chthonicdaemon

0

你的逻辑顺序有误。 json 的设计是将一个单一对象递归地转换为 JSON。因此，在调用dump或dumps之前，你应该始终考虑构建一个单一对象。

首先将其收集到一个数组中：

music = [r for r in reader]

然后将其放入dict中：

result = {'Music': music}

然后转储为 JSON：

json.dump(result, jsonfile)

或者全部在一行中：

json.dump({'Music': [r for r in reader]}, jsonfile)

“有序”的JSON

如果您真的关心JSON中对象属性的顺序（尽管您不应该这样做），则不应使用DictReader。相反，使用常规读取器并自己创建OrderedDict：

from collections import OrderedDict

...

reader = csv.Reader(csvfile)
music = [OrderedDict(zip(fieldnames, r)) for r in reader]

或者再次在一行中：

json.dump({'Music': [OrderedDict(zip(fieldnames, r)) for r in reader]}, jsonfile)

其他

此外，使用上下文管理器来操作文件以确保其被正确关闭：

with open('music.csv', 'r') as csvfile, open('file.json', 'w') as jsonfile:
    # Rest of your code inside this block

- jpmc26

0

它没有按照我想要的顺序写入JSON文件。 csv.DictReader类返回Python dict对象。Python字典是无序集合。您无法控制它们的显示顺序。

Python确实提供了一个有序字典OrderedDict，如果避免使用csv.DictReader()，则可以使用它。

并且跳过了歌曲名称。

这是因为该文件不是真正的CSV文件。特别地，每行都以字段分隔符开头和结尾。我们可以使用.strip("|")来修复它。

我需要将所有这些数据输出到名为“Music”的数组中。

然后程序需要创建一个带有"Music"作为键的字典。

我需要在每个艺术家信息后面加上逗号。我得到的输出结果是这个问题的原因是您多次调用了json.dumps()。如果要获得有效的JSON文件，应该只调用一次。

请尝试以下操作：

import csv
import json
from collections import OrderedDict


def MyDictReader(fp, fieldnames):
    fp = (x.strip().strip('|').strip() for x in fp)
    reader = csv.reader(fp, delimiter="|")
    reader = ([field.strip() for field in row] for row in reader)
    dict_reader = (OrderedDict(zip(fieldnames, row)) for row in reader)
    return dict_reader

csvfile = open('music.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("ID","Artist","Song", "Album")
reader = MyDictReader(csvfile, fieldnames)
json.dump({"Music": list(reader)}, jsonfile, indent=2)

- Robᵩ

不，顺序已经被破坏了。每一行已经是一个标准的“dict”。 - Robᵩ

@MichelMüller - 这是个好主意。我也会把它加入到我的答案中。 - Robᵩ

@Robᵩ 在 CSV 文件还是脚本中？ - orpheus

2

@AlwaysSunny - jpmc的观点是[so]不是旨在成为一个编写代码的服务。它旨在成为一个有用信息的存储库。任何仅仅请求“请帮我写代码”的问题对于未来的人没有价值，实际上会使人们更难找到真正的问题和答案。如果你的目标确实是要让别人帮你写代码，那么还有其他专门设计和旨在达到这个目的的网站。 - Robᵩ

@Robᵩ 对不起，我说话太过分了。我的主要反对意见是提供完全可用的代码块，让原帖作者可以直接复制/粘贴，而不是让他们自己解决一些细节问题。 - jpmc26

显示剩余9条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- chthonicdaemon · Accepted Answer

Pandas可以轻松解决这个问题。首先读取文件。

import pandas

df = pandas.read_csv('music.csv', names=("id","Artist","Song", "Album"))

现在你有一些选择。最快的方法是将其转换为正确的JSON文件：

df.to_json('file.json', orient='records')

输出：

[{"id":1,"Artist":"Empire of the Sun","Song":"We Are The People","Album":"Walking on a Dream"},{"id":2,"Artist":"M83","Song":"Steve McQueen","Album":"Hurry Up We're Dreaming"}]

这并未处理你想要将所有内容放入“音乐”对象或字段顺序的需求，但它确实具有简洁性的优点。

如果要将输出封装到 Music 对象中，我们可以使用 to_dict：

import json
with open('file.json', 'w') as f:
    json.dump({'Music': df.to_dict(orient='records')}, f, indent=4)

输出：

{
    "Music": [
        {
            "id": 1,
            "Album": "Walking on a Dream",
            "Artist": "Empire of the Sun",
            "Song": "We Are The People"
        },
        {
            "id": 2,
            "Album": "Hurry Up We're Dreaming",
            "Artist": "M83",
            "Song": "Steve McQueen"
        }
    ]
}

我建议您重新考虑坚持特定字段顺序，因为JSON规范明确指出“对象是一组无序的名称/值对”（强调是我的）。