如我之前提到的,最简单的方法是将数组转储到文件中,然后将该文件加载为numpy数组。
首先,我们需要巨大列表的大小:
huge_list_size = len(huge_list)
下一步,我们将其转储到磁盘上。
dumpfile = open('huge_array.txt', 'w')
for item in huge_list:
dumpfile.write(str(item)+"\n")
dumpfile.close()
确保如果所有操作都在同一环境中进行,我们清除内存
del huge_list
下面我们定义一个简单的读取生成器。
def read_file_generator(filename):
with open(filename) as infile:
for i, line in enumerate(infile):
yield [i, line]
然后,我们创建一个由零组成的numpy数组,并使用刚刚创建的生成器填充它。
huge_array = np.zeros(huge_list_size, dtype='float16')
for i, item in read_file_generator('huge_array.txt'):
huge_array[i] = item
我之前的回答是错误的。 我建议以下内容是解决方案,但正如hpaulj所评论的那样,它并不是解决方案。
You can do this in a multiple ways, the easiest would be to just dump
the array to a file and then load that file as a numpy array:
dumpfile = open('huge_array.txt', 'w')
for item in huge_array:
print>>dumpfile, item
Then load it as a numpy array
huge_array = numpy.loadtxt('huge_array.txt')
If you want to perform further computations on this data you can also
use the joblib library for memmapping, which is extremely usefull in
handling large numpy array cmputations. Available at
https://pypi.python.org/pypi/joblib
np.array
方法执行时间太长,还是会产生内存错误? - hpaulj