将csv转换为JSON树结构?

9
我看到以下问题:

然而,我仍无法将CSV文件转换为JSON层次结构。我在stackoverflow上找到的脚本都是针对特定问题的。假设有三个需要分组的变量:

condition   target  sub
oxygen      tree    G1
oxygen      tree    G2
water       car     G3
water       tree    GZ
fire        car     GTD
oxygen      bomb    GYYS

这将生成一个类似于以下内容的JSON文件(根据我的尝试):
oxygen
    - tree  
        - G1
        - G2
    - bomb
        -GYYS
water 
    - car
        - G3
    - tree
        -GZ
fire 
    - car   
        - GTD

这些必须按嵌套结构分组,例如:

    {
   "name": "oxygen",
   "children": [
    {
     "name": "tree",
     "children": [
      {"name": "G1"},
      {"name": "G2"},
      {"name": "GYYS"}
     ]
    },
    {
     "name": "bomb",
      "children": [
      {"name": "GYYS"}
     ]
    }
    ]
}
etc.

我在这个网站上尝试了每一个脚本,但我无法编写一个通用的函数来制作类似于flare.json的文件。我可以发布我的代码,但这只是与提供的链接类似。因此,我请求一个简单的代码(或可以帮助我的示例)将其转换为类似于flare.JSON的结构。


你的“三个变量”在json示例中没有出现,也没有明显的联系。因此,请详细解释如何从csv文件中的数据生成json结构。 - Christian König
我对JSON不是很熟悉,但我尝试给出一个简短的示例,说明我想要什么(请参见编辑)@ChristianKönig - CodeNoob
@CodeNoob 我认为你提供了错误的JSON。从CSV中我猜测tree的子级应该是[G1,G2],而[GYYS]只应该作为bomb的子级呈现。我是对的吗? 你需要正确解释层次逻辑(因为只有你知道你需要什么)。 - Hett
我用简单的树形格式和更新后的JSON更新了我的问题,希望现在更清楚了 @Hett - CodeNoob
@CodeNoob 我创建了一个小仓库(https://github.com/hettmett/csv_to_json),并在下面发布了答案。你可以自由使用它,甚至进行修改和贡献。如果你对答案有任何疑虑,请告诉我。 - Hett
显示剩余2条评论
2个回答

19

使用collections标准库中的defaultdict可以使得处理层级结构的问题变得更加容易和可解决。因此,我已经为您的问题开发了一个示例解决方案。但在运行脚本之前,请确保您有逗号分隔的csv文件(命名为test.csv),或者您可以在下面更改csv reader的逻辑。

这是我测试脚本的csv文件。

condition, target, sub, dub
oxygen,tree,G1,T1
oxygen,tree,G2,T1
oxygen,tree,G2,T2
water,car,G3,T1
water,tree,GZ,T1
water,tree,GZ,T2
fire,car,GTD,T3
oxygen,bomb,GYYS,T1

从技术上讲,该脚本应适用于各种维度的csv文件。但是您需要自己测试以确保其有效。

import csv
from collections import defaultdict


def ctree():
    """ One of the python gems. Making possible to have dynamic tree structure.

    """
    return defaultdict(ctree)


def build_leaf(name, leaf):
    """ Recursive function to build desired custom tree structure

    """
    res = {"name": name}

    # add children node if the leaf actually has any children
    if len(leaf.keys()) > 0:
        res["children"] = [build_leaf(k, v) for k, v in leaf.items()]

    return res


def main():
    """ The main thread composed from two parts.

    First it's parsing the csv file and builds a tree hierarchy from it.
    Second it's recursively iterating over the tree and building custom
    json-like structure (via dict).

    And the last part is just printing the result.

    """
    tree = ctree()
    # NOTE: you need to have test.csv file as neighbor to this file
    with open('test.csv') as csvfile:
        reader = csv.reader(csvfile)
        for rid, row in enumerate(reader):

            # skipping first header row. remove this logic if your csv is
            # headerless
            if rid == 0:
                continue

            # usage of python magic to construct dynamic tree structure and
            # basically grouping csv values under their parents
            leaf = tree[row[0]]
            for cid in range(1, len(row)):
                leaf = leaf[row[cid]]

    # building a custom tree structure
    res = []
    for name, leaf in tree.items():
        res.append(build_leaf(name, leaf))

    # printing results into the terminal
    import json
    print(json.dumps(res))


# so let's roll
main()

这里是结果中的JSON段:

{
    "name": "oxygen",
    "children": [
      {
        "name": "tree",
        "children": [
          {
            "name": "G2",
            "children": [
              {
                "name": "T2"
              },
              {
                "name": "T1"
              }
            ]
          },
          {
            "name": "G1",
            "children": [
              {
                "name": "T1"
              }
            ]
          }
        ]
      },
      {
        "name": "bomb",
        "children": [
          {
            "name": "GYYS",
            "children": [
              {
                "name": "T1"
              }
            ]
          }
        ]
      }
    ]
  }

如果您有任何进一步的问题和问题,请告诉我。 祝您编程愉快;)


非常感谢!比起 Stack 上的其他答案,易懂多了。 - CodeNoob
很高兴这对你有帮助! - Hett
@Hett,如果CSV文件中有URL怎么办?我的意思是,我有URL源和目标URL,并且我想根据URL中的目录创建树形结构,并绘制像这样的图形https://bl.ocks.org/mbostock/1062288,有什么想法吗? - Dany M

0

另一种解决方案是使用convtools代码生成库:

from convtools import conversion as c
from convtools.contrib.tables import Table


table = Table.from_csv(
    "tmp2.csv", header=True, dialect=Table.csv_dialect(delimiter="\t")
)

child = None
for column in reversed(table.columns):
    if child is None:
        # the most inner children
        child = c.iter(c.item(column)).as_type(list)
    else:
        child = c.group_by(c.item(column)).aggregate(
            {
                "name": c.item(column),
                "children": c.ReduceFuncs.Array(c.this()).pipe(child),
            }
        )
# this is where code generation happens
converter = child.gen_converter()

converter(table.into_iter_rows(dict))

输出:

[
    {
        "name": "oxygen",
        "children": [
            {"name": "tree", "children": ["G1", "G2"]},
            {"name": "bomb", "children": ["GYYS"]},
        ],
    },
    {
        "name": "water",
        "children": [
            {"name": "car", "children": ["G3"]},
            {"name": "tree", "children": ["GZ"]},
        ],
    },
    {"name": "fire", "children": [{"name": "car", "children": ["GTD"]}]},
]


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接