我通常能够从numpy的einsum函数中获得良好的性能(我也喜欢它的语法)。@Ophion对这个问题的回答表明,在测试的情况下,einsum始终优于“内置”函数(有时少量,有时很多)。但是我遇到了一个einsum非常慢的情况。考虑以下等效函数:
(M, K) = (1000000, 20)
C = np.random.rand(K, K)
X = np.random.rand(M, K)
def func_dot(C, X):
Y = X.dot(C)
return np.sum(Y * X, axis=1)
def func_einsum(C, X):
return np.einsum('ik,km,im->i', X, C, X)
def func_einsum2(C, X):
# Like func_einsum but break it into two steps.
A = np.einsum('ik,km', X, C)
return np.einsum('ik,ik->i', A, X)
我希望func_einsum
能够运行最快,但这不是我遇到的情况。在配备超线程的四核CPU上运行,使用numpy版本1.9.0.dev-7ae0206和OpenBLAS进行多线程处理,我得到了以下结果:
In [2]: %time y1 = func_dot(C, X)
CPU times: user 320 ms, sys: 312 ms, total: 632 ms
Wall time: 209 ms
In [3]: %time y2 = func_einsum(C, X)
CPU times: user 844 ms, sys: 0 ns, total: 844 ms
Wall time: 842 ms
In [4]: %time y3 = func_einsum2(C, X)
CPU times: user 292 ms, sys: 44 ms, total: 336 ms
Wall time: 334 ms
当我把K
增加到200时,差异更加明显:
In [2]: %time y1= func_dot(C, X)
CPU times: user 4.5 s, sys: 1.02 s, total: 5.52 s
Wall time: 2.3 s
In [3]: %time y2= func_einsum(C, X)
CPU times: user 1min 16s, sys: 44 ms, total: 1min 16s
Wall time: 1min 16s
In [4]: %time y3 = func_einsum2(C, X)
CPU times: user 15.3 s, sys: 312 ms, total: 15.6 s
Wall time: 15.6 s
有人能解释一下为什么这里使用 einsum 很慢吗?
如果有影响的话,这是我的 numpy 配置:
In [6]: np.show_config()
lapack_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
language = f77
atlas_threads_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('ATLAS_WITHOUT_LAPACK', None)]
language = c
include_dirs = ['/usr/local/include']
blas_opt_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('ATLAS_INFO', '"\\"None\\""')]
language = c
include_dirs = ['/usr/local/include']
atlas_blas_threads_info:
libraries = ['openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('ATLAS_INFO', '"\\"None\\""')]
language = c
include_dirs = ['/usr/local/include']
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('ATLAS_WITHOUT_LAPACK', None)]
language = f77
include_dirs = ['/usr/local/include']
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
mkl_info:
NOT AVAILABLE
np.einsum
和np.tensordot
时,我注意到了相同的事情。我怀疑这可能只是通用性所付出的代价 -np.dot
调用了高度优化的 BLAS 子例程(dgemm
等),用于计算两个矩阵之间的点积特殊情况,而np.einsum
处理各种可能涉及多个输入矩阵的情况。我不确定其确切细节,但我怀疑设计np.einsum
在所有这些情况下都能充分利用 BLAS 将会很困难。 - ali_m