CSV转JSON转换器(按相同键值分组)

4

我正在尝试将csv格式转换为JSON,我通过谷歌搜索并未找到修改方法来得到所需的结果。

这是我的Python代码:

import csv
import json

def csv_to_json(csvFilePath, jsonFilePath):
    jsonArray = []

    #reading csv (encoding is important)
    with open(csvFilePath, encoding='utf-8') as csvf:
        #csv library function
        csvReader = csv.DictReader(csvf)

        #convert each csv row into python dictionary
        for column in csvReader:
            #add this python dictionary to json array
            jsonArray.append(column)

    #convertion
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
        jsonString = json.dumps(jsonArray, indent=4)
        jsonf.write(jsonString)

csvFilePath='example.csv'
jsonFilePath='output.json'
csv_to_json(csvFilePath, jsonFilePath)

这是我的 CSV 文件格式:

enter image description here

我实际的 JSON 输出:

[
    {
        "Area": "IT",
        "Employee": "Carl",        
    },
    {
        "Area": "IT",
        "Employee": "Walter",      
    },
    {
        "Area": "Financial Resources",
        "Employee": "Jennifer",      
    }
]

我期望的JSON输出:

[
    {
        "Area": "IT",
        "Employee": ["Carl","Walter"],
    },
    {
      "Area": "Financial Resources",
      "Employee": ["Jennifer"],
    }
    
]

提前感谢您!

2个回答

3

类似这样的东西应该可以运行。

def csv_to_json(csvFilePath, jsonFilePath):
    areas = {}
    with open(csvFilePath, encoding='utf-8') as csvf:
        csvReader = csv.DictReader(csvf)
        for column in csvReader:
            area, employee = column["Area"], column["Employee"] # split values 
            if area in areas:  # add all keys and values to one dictionary
                areas[area].append(employee)
            else:
                areas[area] = [employee]
    # convert dictionary to desired output format.
    jsonArray = [{"Area": k, "Employee": v} for k,v in areas.items()]
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
        jsonString = json.dumps(jsonArray, indent=4)
        jsonf.write(jsonString)

你知道如何在for循环中添加更多字段吗?我正在尝试使用以下代码: jsonArray = [{"Area": k, "Employee": v, "anotherfield": a,} for k,v,a in areas.items()] 但是这种写法是不被允许的。 - Mauricio Reyes
如果你想要添加额外的字段,你需要在列表推导式之前进行。但这也取决于为什么要添加更多字段... 这是为了创建一个全新的 csv 文件吗?还是只是添加一个空白字段,以便稍后更新? - Alexander
@MauricioReyes 如果每次迭代的信息相同,您只能通过这种方式添加另一个字段。 jsonArray = [{"Area": k, "Employee": v, "anotherfield": "somevalue"} for k, v in areas.items()] - Alexander
是的,我已经将它添加到理解列表中了。我正在从头开始创建一个文件,这个文件将根据Web服务注册表的职责(新字段不重要,因为这是主要过滤器)而增加。但是主要过滤器是“区域”分组。 - Mauricio Reyes

0

convtools 库提供了许多 reduce 操作来处理聚合(我必须承认,我是作者):

from convtools import conversion as c
from convtools.contrib.tables import Table

# generates an ad-hoc function, which aggregates data
converter = (
    c.group_by(c.item("Area"))
    .aggregate(
        {
            "area": c.item("Area"),
            "employees": c.ReduceFuncs.Array(c.item("Employee")),
        }
    )
    .gen_converter()
)

result = converter(
    Table.from_csv("tmp/in.csv", header=True).into_iter_rows(dict)
)
assert result == [
    {"area": "IT", "employees": ["Carl", "Walter"]},
    {"area": "Financial Resources", "employees": ["Jennifer"]},
]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接