numpy OpenBLAS如何设置线程数最大值？

Question

numpy OpenBLAS如何设置线程数最大值？

pythonmultithreadingnumpyblasintel-mkl

11

我正在使用numpy，我的模型涉及到大量的矩阵乘法。为了加速计算，我使用OpenBLAS多线程库来并行化numpy.dot函数。

我的设置如下：

操作系统：CentOS 6.2服务器 # CPU = 12，#MEM = 96GB
Python版本：Python2.7.6
NumPy：NumPy 1.8.0
OpenBLAS + IntelMKL

$ OMP_NUM_THREADS=8 python test_mul.py

代码取自https://gist.github.com/osdf/

test_mul.py：

import numpy
import sys
import timeit

try:
    import numpy.core._dotblas
    print 'FAST BLAS'
except ImportError:
    print 'slow blas'

print "version:", numpy.__version__
print "maxint:", sys.maxint
print

x = numpy.random.random((1000,1000))

setup = "import numpy; x = numpy.random.random((1000,1000))"
count = 5

t = timeit.Timer("numpy.dot(x, x.T)", setup=setup)
print "dot:", t.timeit(count)/count, "sec"

当我使用OMP_NUM_THREADS=1 python test_mul.py时，结果为：

dot: 0.200172233582 sec

OMP_NUM_THREADS=2

dot: 0.103047609329 sec

OMP_NUM_THREADS=4

dot: 0.0533880233765 sec

一切都很顺利。

然而，当我设置OMP_NUM_THREADS=8时......代码开始“偶尔工作”。

有时它可以正常工作，有时它甚至不能运行并给出核心转储。

当OMP_NUM_THREADS > 10时，代码似乎总是出问题。我想知道这里发生了什么？是否有像每个进程可以使用的最大线程数这样的东西？如果我的机器上有12个CPU，我能否提高限制？

谢谢

- Jing

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ali_m · Accepted Answer

首先，我不是很理解你所说的“OpenBLAS + IntelMKL”。两者都是BLAS库，而numpy在运行时应该仅链接到其中一个。你应该检查实际使用的是这两个库中的哪一个。你可以通过调用以下代码来实现：

$ ldd <path-to-site-packages>/numpy/core/_dotblas.so

更新：numpy v1.10已删除numpy/core/_dotblas.so，但可以使用numpy/core/multiarray.so检查链接。

例如，我链接到OpenBLAS：

...
libopenblas.so.0 => /opt/OpenBLAS/lib/libopenblas.so.0 (0x00007f788c934000)
...

如果您确实在链接OpenBLAS，那么它是从源代码构建的吗？如果是的话，您应该在Makefile.rule中看到一个被注释的选项：

...
# You can define maximum number of threads. Basically it should be
# less than actual number of cores. If you don't specify one, it's
# automatically detected by the the script.
# NUM_THREADS = 24
...

默认情况下，OpenBLAS会尝试自动设置最大线程数，但如果它未正确检测到，请尝试取消注释并编辑此行。

另外，请注意，使用更多线程可能会出现性能递减的情况。除非您的数组非常大，否则使用超过6个线程不太可能带来性能提升，因为线程创建和管理的开销增加了。