将嵌套字典写入csv文件

5

我有一个字典:

dic = {"Location1":{"a":1,"b":2,"c":3},"Location2":{"a":4,"b":5,"c":6}}

我想将这个字典制成一个csv表格,其中最上方的键位于左侧列,子键在最上面的行中作为标题,每个后续行都会填入子键值,如下所示:
Location    a   b   c
Location1   1   2   3
Location2   4   5   6

我已经成功地使用以下脚本完成了这个任务:

import csv

dic = {"Location1":{"a":1,"b":2,"c":3},"Location2":{"a":4,"b":5,"c":6}}
fields = ["Location","a","b","c"]

with open(r"C:\Users\tyler.cowan\Desktop\tabulated.csv", "w", newline='') as f:
    w = csv.DictWriter(f, extrasaction='ignore', fieldnames = fields)
    w.writeheader()
    for k in dic:
        w.writerow({field: dic[k].get(field) or k for field in fields})

有趣的是,我将这个测试案例写成了一个真实案例,并最终导致我的位置键被分布到其他列中。起初我想,我一定是在构建字典时搞错了,但检查后发现我的字典格式完全相同,只是键值更多了而已。然而输出结果却像这样:

Location    a   b   c   d           e   f   g   h
Location1   1   2   3   Location1   7   8   9   10
Location2   4   5   6   Location2   2   3   4   5

以下是我完整的脚本。
# -*- coding: utf-8 -*-

import os
import csv


def pretty(d, indent=0):
    #prettify dict for visual Inspection
   for key, value in d.items():
      print('\t' * indent + str(key))
      if isinstance(value, dict):
         pretty(value, indent+1)
      else:
         if value == "":
             print("fubar")
         print('\t' * (indent+1) + str(value))



inFolder = "Folder"
dirList = os.listdir(inFolder)

#print(dirList)
fields = [ 'Lat-Long']
allData = {}
for file in dirList:
    fname, ext = os.path.splitext(file)
    if fname not in fields:
        fields.append(fname)

    #handle .dat in this block
    if ext.lower() == ".dat":
        #print("found dat ext: " + str(ext))
        with open(os.path.join(inFolder,file), "r") as f:
            for row in f:
                try:
                    row1 = row.split(" ")
                    if str(row1[0])+"-"+str(row1[1]) not in allData:
                        allData[str(row1[0])+"-"+str(row1[1])] = {}
                    else:
                        allData[str(row1[0])+"-"+str(row1[1])][fname] = row1[2]

                except IndexError:
                    row2 = row.split("\t")
                    if str(row2[0])+"-"+str(row2[1]) not in allData:
                        allData[str(row2[0])+"-"+str(row2[1])] = {}
                    else:
                        allData[str(row2[0])+"-"+str(row2[1])][fname] = "NA"

    elif ext.lower() == ".csv":
        with open(os.path.join(inFolder,file), "r") as f:
            for row in f:
                row1 = row.split(",")
                if str(row1[0])+"-"+str(row1[1]) not in allData:
                    allData[str(row1[0])+"-"+str(row1[1])] = {}
                else:
                    allData[str(row1[0])+"-"+str(row1[1])][fname] = row1[2]



pretty(allData)

with open("testBS.csv", "w", newline='') as f:
    w = csv.DictWriter(f, extrasaction='ignore', fieldnames = fields)
    w.writeheader()
    for k in allData:
        w.writerow({field: allData[k].get(field) or k for field in fields})

输入数据如下:

"example.dat"

32.1    101.3   65
32.1    101.3   66
32.1    101.3   67
32.1    101.3   68
32.1    101.3   69
32.1    101.3   70
32.1    101.3   71

我希望找到方法来诊断和解决这种行为问题,因为我似乎无法弄清测试和实际情况之间的区别。


1
如果你有的话,我会推荐使用pandas。 - cs95
我有pandas,但会退而求其次,我想了解原始的解决方案。 - Tyler Cowan
2个回答

5
一个可能的解决方案是创建一个包含位置值和所有子字典键的完整列表的 csv 表头。这样,所有子字典值都可以写入其正确的“键”列下面:
import csv
dic = {"Location1":{"a":1,"b":2,"c":3},"Location2":{"a":4,"b":5,"c":6}, "Location3":{'e':7,'f':8, 'g':9, 'h':10}, "Location4":{'e': 2, 'f': 3, 'g': 4, 'h': 5}}
header = sorted(set(i for b in map(dict.keys, dic.values()) for i in b))
with open('filename.csv', 'w', newline="") as f:
  write = csv.writer(f)
  write.writerow(['location', *header])
  for a, b in dic.items():
     write.writerow([a]+[b.get(i, '') for i in header])

输出:

location,a,b,c,e,f,g,h
Location1,1,2,3,,,,
Location2,4,5,6,,,,
Location3,,,,7,8,9,10
Location4,,,,2,3,4,5

为了清晰起见,您还应考虑循环和调用writerow。 - cs95
@cᴏʟᴅsᴘᴇᴇᴅ 这样更简洁。请查看我的最近编辑。 - Ajax1234
@cᴏʟᴅsᴘᴇᴇᴅ 谢谢您的建议! - Ajax1234
是的,但我不是。@cᴏʟᴅsᴘᴇᴇᴅ - hd1
@TylerCowan 最后一行代码创建了一个包含位置字符串和标头值的列表。dict.get 尝试访问散列到给定键的字典中的值,但如果该键不存在,则可以返回可选值。在这种情况下,该行代码循环遍历所有标题关键字,并返回该迭代中存在的相应值或返回 0 - Ajax1234

2
你可以使用pandas来完成这个任务。
import pandas as pd
dic = {"Location1":{"a":1,"b":2,"c":3},"Location2":{"a":4,"b":5,"c":6}, "Location3":{'e':7,'f':8, 'g':9, 'h':10}, "Location4":{'e': 2, 'f': 3, 'g': 4, 'h': 5}}
pd.DataFrame.from_dict(dic, orient='index').to_csv('temp.csv')

输出:

 ,a,b,c,e,f,g,h
 Location1,1.0,2.0,3.0,,,,
 Location2,4.0,5.0,6.0,,,,
 Location3,,,,7.0,8.0,9.0,10.0
 Location4,,,,2.0,3.0,4.0,5.0

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接