我有一棵树,其结构如下:
my_hash_pop = {
"Europe" : {
"France" : {
"Paris" : 2220445,
"Lille" : 225789,
"Lyon" : 506615 },
"Germany" : {
"Berlin" : 3520031,
"Munchen" : 1544041,
"Dresden" : 540000 },
},
"South America" : {
"Brasil" : {
"Sao Paulo" : 11895893,
"Rio de Janeiro" : 6093472 },
"Argentina" : {
"Salta" : 535303,
"Buenos Aires" : 3090900 },
},
}
我想使用Python将此结构转换为CSV格式:
Europe;Germany;Berlin;3520031
Europe;Germany;Munchen;1544041
Europe;Germany;Dresden;540000
Europe;France;Paris;2220445
Europe;France;Lyon;506615
Europe;France;Lille;225789
South America;Argentina;Buenos Aires;3090900
South America;Argentina;Salta;3090900
South America;Brasil;Sao Paulo;11895893
South America;Brasil;Rio de Janeiro;6093472
在现实生活中,我的树包含大量的叶子(显然不是这个例子中的),我使用的转换脚本需要花费很长时间。我尝试找到更有效的转换方法。以下是我尝试的方法:
第一种方法:在每个叶子上连接字符串:
### METHOD 1 ###
start_1 = time.time()
data_to_write = ""
for region in my_hash_pop:
for country in my_hash_pop[region]:
for city in my_hash_pop[region][country]:
data_to_write += region+";"+country+";"+city+";"+str(my_hash_pop[region][country][city])+"\n"
filename = "my_test_1.csv"
with open("my_test_1.csv", 'w+') as outfile:
outfile.write(data_to_write)
outfile.close()
end_1 = time.time()
print("---> METHOD 1 : Write all took " + str(end_1 - start_1) + "s")
第二种方法:使用“检查点”连接字符串
### METHOD 2 ###
start_2 = time.time()
data_to_write = ""
for region in my_hash_pop:
region_to_write = ""
for country in my_hash_pop[region]:
country_to_write = ""
for city in my_hash_pop[region][country]:
city_to_write = region+";"+country+";"+city+";"+str(my_hash_pop[region][country][city])+"\n"
country_to_write += city_to_write
region_to_write += country_to_write
data_to_write += region_to_write
filename = "my_test_2.csv"
with open("my_test_2.csv", 'w+') as outfile:
outfile.write(data_to_write)
outfile.close()
end_2 = time.time()
print("---> METHOD 2 : Write all took " + str(end_2 - start_2) + "s")
第三种方法:使用 Writer 对象
### METHOD 3 ###
import csv
start_3 = time.time()
with open("my_test_3.csv", 'w+') as outfile:
del_char = b";"
w = csv.writer(outfile, delimiter=del_char)
for region in my_hash_pop:
for country in my_hash_pop[region]:
for city in my_hash_pop[region][country]:
w.writerow([region, country, city, str(my_hash_pop[region][country][city])])
end_3 = time.time()
print("---> METHOD 3 : Write all took " + str(end_3 - start_3) + "s")
比较三种方法在生成树的过程中所需的时间,我发现方法1相当低效。但是,在方法2和方法3之间,结果各不相同且不太明显(通常情况下,方法3似乎更高效)。
因此,我有两个问题:
1. 你是否看到我可以尝试的其他方法? 2. 是否有更好的方法来检查和比较不同方法的效率?
还有一个额外的问题:
我注意到方法1和方法2的输出文件大小完全相同。方法3的输出文件比另外两种方法要大。这是为什么呢?
感谢任何帮助!