我观察到关于
我正在生成大小为N=p*f的带状矩阵,它们具有特定的结构。这些矩阵是对称的三块对角线矩阵,主对角线上有p个大小为fxf的块,副对角线上有p-1个大小为fxf的单位矩阵。
以p=3和f=3为例:
通常这些矩阵的大小为 p = 100,f = 30,N = p * f = 3000,但可以很容易地变得更大。
鉴于这些矩阵的结构,我原本希望 scipy 中的带状特征值求解器比密集特征值求解器要快得多,然而事实似乎并非如此。
我正在使用以下代码对求解器进行基准测试:
我得到的结果表明,带状特征值求解器比密集特征值求解器慢得多。
Scipy配置:
scipy.linalg.eig_banded
特征求解器的奇怪行为。我正在生成大小为N=p*f的带状矩阵,它们具有特定的结构。这些矩阵是对称的三块对角线矩阵,主对角线上有p个大小为fxf的块,副对角线上有p-1个大小为fxf的单位矩阵。
以p=3和f=3为例:
[2 2 2 1 0 0 0 0 0]
[2 2 2 0 1 0 0 0 0]
[2 2 2 0 0 1 0 0 0]
[1 0 0 3 3 3 1 0 0]
[0 1 0 3 3 3 0 1 0]
[0 0 1 3 3 3 0 0 1]
[0 0 0 1 0 0 4 4 4]
[0 0 0 0 1 0 4 4 4]
[0 0 0 0 0 1 4 4 4]
通常这些矩阵的大小为 p = 100,f = 30,N = p * f = 3000,但可以很容易地变得更大。
鉴于这些矩阵的结构,我原本希望 scipy 中的带状特征值求解器比密集特征值求解器要快得多,然而事实似乎并非如此。
我正在使用以下代码对求解器进行基准测试:
# Set dimension of problem
f = 50
p = 80
a = 1
print(f"p={p}, f={f}, size={f*p, f*p}")
print(f"Matrix containing random numbers in {(-a, a)}")
A = generate_matrix(p, f, -a, a)
# Benchmark standard eigensolver
start = time()
D, Q = linalg.eigh(A)
end = time()
# Test correctness
D = np.diag(D)
print(f"Time for dense solver {end - start}")
print(f"||AQ - QD|| = {np.linalg.norm(A@Q - Q@D)}")
# Convert A to banded format
A_banded = banded_format(A, upper = f)
# Benchmark banded eigensolver
start = time()
D, Q = linalg.eig_banded(A_banded)
end = time()
# Test correctness
D = np.diag(D)
print(f"Time for banded solver {end - start}")
print(f"||AQ - QD|| = {np.linalg.norm(A@Q - Q@D)}")
我得到的结果表明,带状特征值求解器比密集特征值求解器慢得多。
p=80, f=50, size=(4000, 4000)
Matrix containing random numbers in (-1, 1)
Time for dense solver 13.475645780563354
||AQ - QD|| = 3.1334336527852233e-12
Time for banded solver 24.427151203155518
||AQ - QD|| = 1.589349711533356e-11
我已经尝试过将矩阵存储在下对角线格式中,并传递overwrite_a_band=True
选项,但性能仍然相同。
Numpy配置:
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
Scipy配置:
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/cluster/apps/gcc-8.2.0/openblas-0.2.20-5gatj7a35vypgjekzf3ibbtz54tlbk3m/lib']
我还尝试在另一个集群上使用 MKL 作为后端而不是 OpenBLAS 运行相同的基准测试,并观察到非常相似的结果。同时,使用 OMP_NUM_THREADS
和/或 MKL_NUM_THREADS
设置线程数对性能影响很小。
有没有人对这种情况发生的原因有什么想法?
谢谢