如果我们知道列表的长度和其中所有数组的大小(大小相同),那么将一个numpy数组列表合并为一个数组的最快方法是什么?
我尝试了两种方法:
来自Pythonic way to create a numpy array from a list of numpy arrays的
merged_array = array(list_of_arrays)
和vstack
如您所见,vstack
更快,但由于(缺失)预分配的原因,第一次运行需要比第二次运行慢三倍。那么,如何为vstack
预分配一个数组?或者您知道更快的方法吗?
谢谢!
[更新]
我需要(25280, 320)
而不是(80, 320, 320)
,这意味着merged_array = array(list_of_arrays)
对我不起作用。感谢Joris指出这一点!!!
0.547468900681 s merged_array = array(first_list_of_arrays)
0.547191858292 s merged_array = array(second_list_of_arrays)
0.656183958054 s vstack first
0.236850976944 s vstack second
代码:
import numpy
import time
width = 320
height = 320
n_matrices=80
secondmatrices = list()
for i in range(n_matrices):
temp = numpy.random.rand(height, width).astype(numpy.float32)
secondmatrices.append(numpy.round(temp*9))
firstmatrices = list()
for i in range(n_matrices):
temp = numpy.random.rand(height, width).astype(numpy.float32)
firstmatrices.append(numpy.round(temp*9))
t1 = time.time()
first1=numpy.array(firstmatrices)
print time.time() - t1, "s merged_array = array(first_list_of_arrays)"
t1 = time.time()
second1=numpy.array(secondmatrices)
print time.time() - t1, "s merged_array = array(second_list_of_arrays)"
t1 = time.time()
first2 = firstmatrices.pop()
for i in range(len(firstmatrices)):
first2 = numpy.vstack((firstmatrices.pop(),first2))
print time.time() - t1, "s vstack first"
t1 = time.time()
second2 = secondmatrices.pop()
for i in range(len(secondmatrices)):
second2 = numpy.vstack((secondmatrices.pop(),second2))
print time.time() - t1, "s vstack second"
timeit
在Python中进行简单的性能测试。它可以产生更准确的结果。 - Björn Pollex