我有一个简单的 python(numpy)矩阵乘法代码。
import numpy as np
import time
a = np.random.random((70000,3000));
b = np.random.random((3000,100));
t1=time.time()
c = np.dot(a,b);
t2=time.time()
print 'Time passed is %2.2f seconds' %(t2-t1
在单个核上完成乘法(c = np.dot(a,b);)需要约16秒的时间。然而,当我在Matlab上运行相同的乘法时,它只需要约1秒钟(6个核心)就能完成乘法。
那么,为什么Matlab对于矩阵乘法比numpy快2.6倍呢?(对于我来说,每个核心的性能很重要)
更新:我这次尝试使用Eigen做同样的事情。它的表现略优于Matlab。Eigen使用与Numpy相同的Blas实现,因此Blas实现可能不是性能不足的源头。
为了确保安装的numpy使用了BLAS,我使用np.show_config()。
enter code here
blas_info:
libraries = ['blas']
library_dirs = ['/usr/lib64']
language = f77
lapack_info:
libraries = ['lapack']
library_dirs = ['/usr/lib64']
language = f77
atlas_threads_info:
NOT AVAILABLE
blas_opt_info:
libraries = ['blas']
library_dirs = ['/usr/lib64']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
atlas_blas_threads_info:
NOT AVAILABLE
lapack_opt_info:
libraries = ['lapack', 'blas']
library_dirs = ['/usr/lib64']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
atlas_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
mkl_info:
NOT AVAILABLE