背景:我刚开始接触scikit-learn,阅读了页面底部关于 joblib与pickle的对比。
也许更有趣的是使用joblib替代pickle(joblib.dump和joblib.load),它在处理大数据时更加高效,但只能将数据保存到磁盘而不能保存到字符串中
我看到了这篇有关Pickle的常见用例,想知道社区是否能分享一下joblib和pickle之间的区别?在什么情况下应该使用其中之一?
背景:我刚开始接触scikit-learn,阅读了页面底部关于 joblib与pickle的对比。
也许更有趣的是使用joblib替代pickle(joblib.dump和joblib.load),它在处理大数据时更加高效,但只能将数据保存到磁盘而不能保存到字符串中
我看到了这篇有关Pickle的常见用例,想知道社区是否能分享一下joblib和pickle之间的区别?在什么情况下应该使用其中之一?
mmap_mode="r"
。感谢Gunjan提供这个脚本!我对其进行了修改以适用于Python3结果
#comapare pickle loaders
from time import time
import pickle
import os
import _pickle as cPickle
from sklearn.externals import joblib
file = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'database.clf')
t1 = time()
lis = []
d = pickle.load(open(file,"rb"))
print("time for loading file size with pickle", os.path.getsize(file),"KB =>", time()-t1)
t1 = time()
cPickle.load(open(file,"rb"))
print("time for loading file size with cpickle", os.path.getsize(file),"KB =>", time()-t1)
t1 = time()
joblib.load(file)
print("time for loading file size joblib", os.path.getsize(file),"KB =>", time()-t1)
time for loading file size with pickle 79708 KB => 0.16768312454223633
time for loading file size with cpickle 79708 KB => 0.0002372264862060547
time for loading file size joblib 79708 KB => 0.0006849765777587891
lis = []
这行吗?(2)如何复制代码?也就是说,我们应该如何构建“database”文件?谢谢。 - RMurphy我遇到了同样的问题,所以尝试了这个解决方案(使用Python 2.7),因为我需要加载一个大型的pickle文件。
#comapare pickle loaders
from time import time
import pickle
import os
try:
import cPickle
except:
print "Cannot import cPickle"
import joblib
t1 = time()
lis = []
d = pickle.load(open("classi.pickle","r"))
print "time for loading file size with pickle", os.path.getsize("classi.pickle"),"KB =>", time()-t1
t1 = time()
cPickle.load(open("classi.pickle","r"))
print "time for loading file size with cpickle", os.path.getsize("classi.pickle"),"KB =>", time()-t1
t1 = time()
joblib.load("classi.pickle")
print "time for loading file size joblib", os.path.getsize("classi.pickle"),"KB =>", time()-t1
这个的输出结果是
time for loading file size with pickle 1154320653 KB => 6.75876188278
time for loading file size with cpickle 1154320653 KB => 52.6876490116
time for loading file size joblib 1154320653 KB => 6.27503800392
仅作一份谦逊的说明... 对于已拟合的scikit-learn估算器/训练模型,Pickle更好。在机器学习应用中,训练好的模型主要是为了进行预测而保存和加载。
Joblib
而不是Pickle
?我们应该考虑Joblib
的任何缺点吗?我最近才听说过Joblib
,它听起来很有趣。 - Chau Phamscikit-learn
仍然建议使用joblib
。一定有原因,对吧? - Dr_Zaszuś