如何解决调用 `cublasSgemm` 时出现 RuntimeError CUDA 错误 CUBLAS_STATUS_INVALID_VALUE？

Question

如何解决调用 `cublasSgemm` 时出现 RuntimeError CUDA 错误 CUBLAS_STATUS_INVALID_VALUE？

5

当在一个工作的cuda环境中训练一些模型时，你可能会遇到错误 RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

这是什么意思以及如何解决?

- Jeremy Cochoy

你能提供完整的错误跟踪吗？ - Ivan

2个回答

3

在使用fairseq时，我遇到了这个错误。我在Amazon Linux 2上安装的CUDA版本是11.5，torch版本是1.13.1。我通过卸载并安装版本1.12.1来解决了这个错误。

后来，我也尝试了这种方式安装torch和cuda，它也运行良好： pip install torch==1.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116

- upadrasta84

1

与 https://github.com/openai/CLIP 相同的问题。 - starbeamrainbowlabs

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jeremy Cochoy · Accepted Answer

这可能是一个形状错误报告不完整的问题：

例如，在将x.shape == [a, b]输入到nn.Linear(c, c, bias=False)时，如果c的形状与x不匹配，就会出现维度不匹配的错误。此错误消息将显示。

请参见PyTorch论坛的对话。