将嵌套的JSON转换为CSV

9

我正在将一个10多层的嵌套JSON对象转换为C# .NET中的CSV文件。

我一直在使用JavaScriptSerializer().Deserialize<ObjectA>(json)XmlNode xml = (XmlDocument)JsonConvert.DeserializeXmlNode(json)来分解对象。通过这些对象,我可以进一步写入CSV文件。然而,现在JSON对象进一步扩展了。大多数数据并没有真正使用,因此我希望进行原始数据转储。

有没有更简单的方法可以在不声明结构的情况下将数据转储为csv格式?

示例JSON:

{
"F1":1,
"F2":2,
"F3":[
    {
        "E1":3,
        "E2":4
    },
    {
        "E1":5,
        "E2":6
    },  
    {
        "E1":7,
        "E2":8,
        "E3":[
            {
                "D1":9,
                "D2":10
            }
        ]
    },      
]
}   

我的期望CSV输出结果是:
F1,F2,E1,E2,D1,D2
1,2
1,2,3,4
1,2,5,6
1,2,7,8,9,10

你能不能将那些数据转换成 DataTable,然后再从中转换成 CSV 呢? - MethodMan
@MethodMan 这正是我处理前几个级别所做的。毕竟,我发现将它们放入结构中太耗费时间了,因此我正在检查是否有任何自动化方法。 - ydoow
1
我已经尝试过了,可以在这里查看:https://dotnetfiddle.net/7MBHUc 它没有起作用,但我能够遍历。下次有机会时会更新。 - Gaurav Gandhi
@ErrHunter 看起来是一个更接近的解决方案。1)需要摆脱具有子元素的节点;和2)更多的数据类型检查。 - ydoow
@ydoow,是的我知道。但是我有点忙,所以可能要在8-10小时后才能更新。 - Gaurav Gandhi
显示剩余6条评论
3个回答

13
您的请求存在不一致性:您希望为具有子级的根对象生成一行,但您不希望为"F3[2]"对象生成一行,该对象也具有子级。因此,您的规则是“为至少具有一个原始值属性的对象打印一行,只要该对象是根对象或没有具有至少一个原始值属性的后代对象”。这有点棘手,但可以使用LINQ to JSON完成。
        var obj = JObject.Parse(json);

        // Collect column titles: all property names whose values are of type JValue, distinct, in order of encountering them.
        var values = obj.DescendantsAndSelf()
            .OfType<JProperty>()
            .Where(p => p.Value is JValue)
            .GroupBy(p => p.Name)
            .ToList();

        var columns = values.Select(g => g.Key).ToArray();

        // Filter JObjects that have child objects that have values.
        var parentsWithChildren = values.SelectMany(g => g).SelectMany(v => v.AncestorsAndSelf().OfType<JObject>().Skip(1)).ToHashSet();

        // Collect all data rows: for every object, go through the column titles and get the value of that property in the closest ancestor or self that has a value of that name.
        var rows = obj
            .DescendantsAndSelf()
            .OfType<JObject>()
            .Where(o => o.PropertyValues().OfType<JValue>().Any())
            .Where(o => o == obj || !parentsWithChildren.Contains(o)) // Show a row for the root object + objects that have no children.
            .Select(o => columns.Select(c => o.AncestorsAndSelf()
                .OfType<JObject>()
                .Select(parent => parent[c])
                .Where(v => v is JValue)
                .Select(v => (string)v)
                .FirstOrDefault())
                .Reverse() // Trim trailing nulls
                .SkipWhile(s => s == null)
                .Reverse());

        // Convert to CSV
        var csvRows = new[] { columns }.Concat(rows).Select(r => string.Join(",", r));
        var csv = string.Join("\n", csvRows);

        Console.WriteLine(csv);

使用
public static class EnumerableExtensions
{
    // https://dev59.com/R3A75IYBdhLWcg3wGlEa
    public static HashSet<T> ToHashSet<T>(this IEnumerable<T> source)
    {
        return new HashSet<T>(source);
    }
}

输出结果为:

F1,F2,E1,E2,D1,D2
1,2
1,2,3,4
1,2,5,6
1,2,7,8,9,10

真遗憾这个问题没有被标记为解决方案,或者至少没有得到评论。如果我尝试一下的话,我会回来汇报的。 - Jeremy A. West
杰出的解决方案。 - FosterZ

1

我写了这个并且它对我有效。 在这里,我们使用 prop_prop 的格式将对象树的所有面包屑保存在标题中。 并以 prop1 的格式将 jarray 属性对象保存在标题中。

    public Dictionary<string, string> ComplexJsonToDictionary(JObject jObject, Dictionary<string, string> result, string field)
    {
        foreach (var property in jObject.Properties())
        {
            var endField = field + (string.IsNullOrEmpty(field) ? "" : "_") + property.Name;

            var innerDictionary = new Dictionary<string, string>();
            try
            {
                var innerValue = JObject.Parse(Convert.ToString(property.Value));


                result.AddOrOverride(ComplexJsonToDictionary(innerValue, innerDictionary, endField));
            }
            catch (Exception)
            {
                try
                {
                    var innerValues = JArray.Parse(Convert.ToString(property.Value));
                    try
                    {
                        var i = 0;
                        foreach (var token in innerValues)
                        {
                            var innerValue = JObject.Parse(Convert.ToString(token));

                            result.AddOrOverride(ComplexJsonToDictionary(innerValue, innerDictionary, endField+i++));
                        }
                    }
                    catch (Exception)
                    {
                        result.Add(endField, string.Join(",", innerValues.Values<string>()));
                    }
                }
                catch (Exception)
                {
                    result.Add(endField, property.Value.ToString());
                }
            }
        }
        return result;
    }

感谢关注,请在适当的情况下撰写评论。

-2
将已解析的 JSON jobject 传递给静态扩展,它将返回展平后的 jobjects 数组,然后转换为 CSV。代码部分取自其他 Stack Overflow 和其他资源,我没有参考资料。
 public static IEnumerable<string> JsonToCsvRowsWithHierarchyHeaders(string jsonData)
    {
        if (jsonData.Trim().StartsWith("[") && jsonData.Trim().EndsWith("]"))
        {
            var startString = "{\"appendRoot\":";
            var endString = "}";
            jsonData = $"{startString}{jsonData}{endString}";
        }

        var jObject = JsonConvert.DeserializeObject<JObject>(jsonData);
        var flattenJObjects = JsonParserExtensions.FlattenJsonGetJObjects(jObject).ToList();
  
        var csvRows = new List<string>();
        if (flattenJObjects.Any())
        {
            var firstRow = flattenJObjects.First().Children<JProperty>().Select(x => x.Name).ToList();
            var header = string.Join(delimeter, firstRow).Replace("appendRoot_", "").ToLower();
            csvRows.Add(header);
            foreach (var flattenJObject in flattenJObjects)
            {
                var tokens = flattenJObject.Children<JProperty>();

                if (firstRow.Count() != tokens.Count())
                {
                    var missMatchPropertyValues = firstRow.Select(cell => tokens.FirstOrDefault(x => x.Name == cell))
                        .Select(value => value == null
                            ? string.Empty
                            : CheckAndUpdateRowCellValueTextQualifier(JsonConvert.DeserializeObject<string>(value.Value.ToString(Newtonsoft.Json.Formatting.None))))
                        .ToList();
                    var rowString = string.Join(delimeter, missMatchPropertyValues);
                    csvRows.Add(rowString);
                }
                else
                {
                    var rowValue = tokens.Select(token =>
                        CheckAndUpdateRowCellValueTextQualifier(
                            JsonConvert.DeserializeObject<string>(
                                token.Value.ToString(Newtonsoft.Json.Formatting.None))));
                    var rowString = string.Join(delimeter, rowValue);
                    csvRows.Add(rowString);
                }
            }
        }
        return csvRows;
    }

 
    private static string CheckAndUpdateRowCellValueTextQualifier(string value)
    {
        const string q = @"""";
        if (!string.IsNullOrEmpty(value) && value.Contains('\"'))
        {
            return value;
        }

        if (!string.IsNullOrEmpty(value) && (value.Contains(',') ||
                                         value.Contains('"') ||
                                         value.Contains('\n') || value.Contains('\r')))
        {
            return $"{q}{value}{q}";
        }
        
        return value;
    }
}

public static class JsonParserExtensions
{
    public static IEnumerable<JObject> FlattenJsonGetJObjects(JObject jObject, string parentName = null)
    {
        if (!(parentName is null))
            jObject = RenamePropertiesByHierarchyName(jObject, parentName);

        var fields = jObject.Properties().Where(p => p.Value.GetType().Name == "JValue").ToList();
        var objects = jObject.Properties().Where(p => p.Value.GetType().Name == "JObject").ToList();
        var arrays = jObject.Properties().Where(p => p.Value.GetType().Name == "JArray").ToList();
        var objectsArray = arrays.Where(array => array.Value.All(elements => elements.GetType().Name == "JObject")).ToList();
        var valuesArray = arrays.Where(array => array.Value.All(elements => elements.GetType().Name == "JValue")).ToList();

        var nestedObjects = ProcessNestedObjects(objects);

        var joinedInnerObjects = nestedObjects.Any()
            ? nestedObjects.Select(innerObject => JoinJObject(new JObject(fields), innerObject))
            : new List<JObject> { new JObject(fields) };

        var arraysObjectList = GetJObjectsFromArrayOfJProperties(objectsArray);
        var arraysValueList = GetJObjectsFromArrayOfValues(valuesArray);
        var joinedAll = joinedInnerObjects.SelectMany(inner => JoinMultipleJObjects(arraysObjectList, arraysValueList, inner));
        return joinedAll;
    }

    public static List<JObject> ProcessNestedObjects(List<JProperty> jObjects)
    {
        var processNestedObjects = new List<JObject>();
        var renamedJObjects = jObjects?.Select(obj => RenamePropertiesByHierarchyName(obj.Value.ToObject<JObject>(), obj.Name)).ToList();

        if (!(renamedJObjects?.Count > 0)) return processNestedObjects;
        var renamed = renamedJObjects.Aggregate((acc, next) => JoinJObject(acc, next));

        var nestedObjects = renamed.Properties().Where(p => p.Value.GetType().Name == "JObject").ToList();
        var nestedArrays = renamed.Properties().Where(p => p.Value.GetType().Name == "JArray").ToList();
        var nestedObjectsArray = nestedArrays.Where(array => array.Value.All(elements => elements.GetType().Name == "JObject")).ToList();
        var nestedValuesArray = nestedArrays.Where(array => array.Value.All(elements => elements.GetType().Name == "JValue")).ToList();

        nestedArrays.ForEach(p => renamed.Remove(p.Name));
        nestedObjects.ForEach(p => renamed.Remove(p.Name));

        var nestedObjectList = new List<JObject>();
        var nestedMultipleObjectList = new List<JObject>();
        foreach (var listJObjects in nestedObjects.Select(innerObject => FlattenJsonGetJObjects(innerObject.Value.ToObject<JObject>(), innerObject.Name)).ToList())
        {
            if (listJObjects.Count() > 1)
                nestedMultipleObjectList.AddRange(listJObjects);
            else
                nestedObjectList.Add(listJObjects.First());
        }

        var jObjectsFromArrayOfJProperties = GetJObjectsFromArrayOfJProperties(nestedObjectsArray);
        var jObjectsFromArrayOfValues = GetJObjectsFromArrayOfValues(nestedValuesArray);

        var aggregate = nestedObjectList.Aggregate(renamed, (acc, next) => JoinJObject(acc, next));
        var groupedNestedObjects = (nestedMultipleObjectList.Any()) ? nestedMultipleObjectList.Select(nested => JoinJObject(aggregate, nested))
            : new List<JObject> { aggregate };
        var groupedNestedObjectsList = groupedNestedObjects.Select(groupedNested => JoinMultipleJObjects(jObjectsFromArrayOfJProperties, jObjectsFromArrayOfValues, groupedNested));
        processNestedObjects.AddRange(groupedNestedObjectsList.SelectMany(e => e));
        return processNestedObjects;
    }

    public static List<JObject> JoinMultipleJObjects(List<JObject> nestedArraysObjectList, List<JObject> nestedArraysValueList, JObject groupedNestedObjects)
    {
        var result = new List<JObject>();
        var joined = new List<JObject>();
        if (!nestedArraysObjectList.Any())
            joined.Add(groupedNestedObjects);
        else
            nestedArraysObjectList.ForEach(e => joined.Add(JoinJObject(groupedNestedObjects, e)));

        result.AddRange(nestedArraysValueList.Any()
            ? nestedArraysValueList
                .SelectMany(value => joined, (value, joinedItem) => JoinJObject(joinedItem, value)).ToList()
            : joined);
        return result;
    }

    public static List<JObject> GetJObjectsFromArrayOfJProperties(List<JProperty> nestedJProperties)
    {
        var fromArrayOfJProperties = new List<JObject>();
        foreach (var jProperty in nestedJProperties)
        {
            var nestedArraysObjectList = new List<JObject>();
            var name = jProperty.Name;
            var jPropertyValue = jProperty.Value;
            var renamedObjects = jPropertyValue?.Select(obj => RenamePropertiesByHierarchyName(obj.ToObject<JObject>(), name)).ToList();
            foreach (var jObjects in renamedObjects.Select(innerObject => FlattenJsonGetJObjects(innerObject.ToObject<JObject>())))
            {
                nestedArraysObjectList.AddRange(jObjects);
            }

            if (fromArrayOfJProperties.Any() && nestedArraysObjectList.Any())
                fromArrayOfJProperties = nestedArraysObjectList
                    .SelectMany(nested => fromArrayOfJProperties, (current, joined) => JoinJObject(joined, current)).ToList();

            if (!fromArrayOfJProperties.Any())
                fromArrayOfJProperties.AddRange(nestedArraysObjectList);
        }
        return fromArrayOfJProperties;
    }

    public static List<JObject> GetJObjectsFromArrayOfValues(List<JProperty> nestedValuesArray)
    {
        return (from innerArray in nestedValuesArray let name = innerArray.Name let values = innerArray.Value from innerValue in values select new JObject(new JProperty(name, innerValue.ToObject<JValue>()))).ToList();
    }

    public static JObject RenamePropertiesByHierarchyName(JObject jObject, string hierarchyName)
    {
        var properties = jObject.Properties().ToList().Select(p => new JProperty($"{hierarchyName}_{p.Name}", p.Value));
        return new JObject(properties);
    }

    public static JObject JoinJObject(JObject parentJObject, JObject innerObject)
    {
        var joinJObject = new JObject
            {
                parentJObject.Properties(),
                innerObject.Properties()
            };
        return joinJObject;
    }
}

如果您有新的问题,请通过单击提问按钮来提出。如果它有助于提供上下文,请包含此问题的链接。- 来自审核 - cboden

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接