当使用scipy.linalg.blas.sgemm时，numpy.dot在处理大数组时会返回无效值。

Question

当使用scipy.linalg.blas.sgemm时，numpy.dot在处理大数组时会返回无效值。

3

我正在尝试计算 A • A^T:

# These are my dummy values for testing
A = np.ones((150000,265),dtype=np.float32, order='F')
A_T = np.ones((265, 150000),dtype=np.float32, order='F')

out = scipy.linalg.blas.sgemm(alpha=1.0, a=A, b=A_T)

两分钟后：

In [7]: out
Out[7]: 
array([[ 265.,  265.,  265., ...,    0.,    0.,    0.],
       [ 265.,  265.,  265., ...,    0.,    0.,    0.],
       [ 265.,  265.,  265., ...,    0.,    0.,    0.],
       ..., 
       [ 265.,  265.,  265., ...,    0.,    0.,    0.],
       [ 265.,  265.,  265., ...,    0.,    0.,    0.],
       [ 265.,  265.,  265., ...,    0.,    0.,    0.]])

In [10]: out.shape
Out[10]: (150000, 150000)

注意到了这些零吗？我迷失了...我尝试使用64位浮点数，但输出结果相同。从35468开始，数组中就全是零。

In [39]: out[0,35468]
Out[39]: 0.0

In [9]: scipy.__version__
Out[9]: '0.12.1'

更新/编辑:

我相当确定，np.dot是在调用*gemm方法本身。

In [1]: A = np.ones((150000,265), dtype=np.float32, order='F')

In [2]: A_T = np.ones((265, 150000),dtype=np.float32, order='F')

In [3]: out = A.dot(A_T)

In [4]: out.shape
Out[4]: (150000, 150000)

In [5]: out
Out[5]: 
array([[ 265.,  265.,  265., ...,  265.,  265.,  265.],
   [ 265.,  265.,  265., ...,  265.,  265.,  265.],
   [ 265.,  265.,  265., ...,  265.,  265.,  265.],
   ..., 
   [   0.,    0.,    0., ...,    0.,    0.,    0.],
   [   0.,    0.,    0., ...,    0.,    0.,    0.],
   [   0.,    0.,    0., ...,    0.,    0.,    0.]], dtype=float32)

- wbg

变量roi和roiT是什么？ - gpcz

2

从scipy.linalg.blas文档中： “__警告__：这些函数几乎不进行错误检查。滥用它们可能会导致崩溃，因此最好使用scipy.linalg中的更高级别例程。”既然np.dot可以进行矩阵乘法，为什么选择使用scipy.linalg.blas.dgemm？ - John1024

1

对我来说，np.dot(A, A_T) 更加准确和更快。由于你的示例数据摧毁了我的计算机，我不得不缩小数据规模 :P - askewchan

我非常确定np.dot正在调用这些确切的BLAS例程。我在np.dot中也遇到了同样的失败。我现在正在重新运行以确认，并将更新原始帖子。 - wbg

@John1024 感谢您的提问。我最初选择sgemm是因为我想通过不需要创建A和A_T来节省内存，因为*gemm具有'trans_a'和'trans_b'参数。现在，我认为这并不重要，因为这两个float32数组不会占用太多空间。 - wbg

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ev-br · Answer 1

在我的机器上，您的示例会出现内存错误，np.dot(A, A.T)也是如此。

缩小矩阵大小后，它会产生正确的结果。

要进一步调试，请尝试直接使用Fortran BLAS调用，而不使用Python。如果通过，请考虑在scipy跟踪器中提交错误报告。

顺便说一句，从scipy 0.13开始，您可以使用syrk。