我在Python中创建了一个字典,并将其转储为pickle。它的大小增加到了300MB。现在,我想加载同样的pickle文件。
output = open('myfile.pkl', 'rb')
mydict = pickle.load(output)
加载pickle文件大约需要15秒的时间。 我该如何缩短这段时间?
硬件规格: Ubuntu 14.04, 4GB RAM
下面的代码展示了使用json、pickle和cPickle来转储或加载文件所需的时间。
转储后,文件大小约为300MB。
import json, pickle, cPickle
import os, timeit
import json
mydict= {all values to be added}
def dump_json():
output = open('myfile1.json', 'wb')
json.dump(mydict, output)
output.close()
def dump_pickle():
output = open('myfile2.pkl', 'wb')
pickle.dump(mydict, output,protocol=cPickle.HIGHEST_PROTOCOL)
output.close()
def dump_cpickle():
output = open('myfile3.pkl', 'wb')
cPickle.dump(mydict, output,protocol=cPickle.HIGHEST_PROTOCOL)
output.close()
def load_json():
output = open('myfile1.json', 'rb')
mydict = json.load(output)
output.close()
def load_pickle():
output = open('myfile2.pkl', 'rb')
mydict = pickle.load(output)
output.close()
def load_cpickle():
output = open('myfile3.pkl', 'rb')
mydict = pickle.load(output)
output.close()
if __name__ == '__main__':
print "Json dump: "
t = timeit.Timer(stmt="pickle_wr.dump_json()", setup="import pickle_wr")
print t.timeit(1),'\n'
print "Pickle dump: "
t = timeit.Timer(stmt="pickle_wr.dump_pickle()", setup="import pickle_wr")
print t.timeit(1),'\n'
print "cPickle dump: "
t = timeit.Timer(stmt="pickle_wr.dump_cpickle()", setup="import pickle_wr")
print t.timeit(1),'\n'
print "Json load: "
t = timeit.Timer(stmt="pickle_wr.load_json()", setup="import pickle_wr")
print t.timeit(1),'\n'
print "pickle load: "
t = timeit.Timer(stmt="pickle_wr.load_pickle()", setup="import pickle_wr")
print t.timeit(1),'\n'
print "cPickle load: "
t = timeit.Timer(stmt="pickle_wr.load_cpickle()", setup="import pickle_wr")
print t.timeit(1),'\n'
输出:
Json dump:
42.5809804916
Pickle dump:
52.87407804489
cPickle dump:
1.1903790187836
Json load:
12.240660209656
pickle load:
24.48748306274
cPickle load:
24.4888298893
我看到cPickle在转储和加载时需要的时间较少,但是加载文件仍然需要很长时间。
cPickle
?如果没有,请尝试一下。你可以将其作为一个即插即用的替代品。 - Carstenpickle.HIGHEST_PROTOCOL
(或-1
)。这将使用比默认的基于ASCII的数据格式更紧凑的二进制模式数据格式。 - martineau