无法加载动态库'libcublasLt.so.11'; dlerror: libcublasLt.so.11: 无法打开共享对象文件: 没有这个文件或目录

8

我刚刚更新了我的显卡驱动

sudo apt install nvidia-driver-470
sudo apt install cuda-drivers-470

我之前决定以这种方式安装它们是因为当试图进行 sudo apt upgrade 时它们被阻止了。然后我错误地运行了 sudo apt autoremove 来清理旧包。在我的电脑重启以便正确设置新驱动程序后,我不能再使用tensorflow的GPU加速了。

import tensorflow as tf
tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-12-07 16:52:01.771391: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-07 16:52:01.807283: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 16:52:01.807973: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.808017: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.808048: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.856391: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.856466: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.857601: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
False

2个回答

9

你是否已经安装了cuda-toolkit?该错误表示找不到库的11版本。问题在于cudatoolkit和cudnn版本可能与你的tensorflow版本不兼容。

如果您已经安装了正确版本的工具包,请直接进入第5步。(您可以使用命令nvcc --version检查版本)。

  1. Download the installer from https://developer.nvidia.com/cuda-11-4-4-download-archive?target_os=Linux (this version is compatible with the driver nvidia-470 you installed). The next steps are specific to the runfile option.

  2. As you already installed nvidia-drivers, press Continue if this message appears.

    enter image description here

  3. Accept the terms.

    enter image description here

  4. Again, as you already installed the drivers, just disable the Driver option and press Install.

    enter image description here

  5. Now you need to configure the paths for binaries and libraries. Using find command search for nvcc and libcublas.so.*:

    sudo find / -name 'nvcc'  # Path to binaries
    sudo find / -name 'libcublas.so.*'  # Path to libraries
    
  6. Finally, add the next lines at the end of file ~/.profile according to the paths you found above. Cuda was installed on /usr/local/cuda-11.4 in my system.

    if [ -d "/usr/local/cuda-11.4" ]; then
        PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
        LD_LIBRARY_PATH=/usr/local/cuda-11.4/targets/x86_64-linux/lib/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    fi
    
如果更新~\.profile无效,请尝试更新.bashrc或.zshrc文件(如果您使用zsh而不是bash)。
7. 重新启动计算机。

我的HPS没有sudo权限 :( 有其他解决方案吗? - Charlie Parker
我找到了这个问题。希望它能帮到你,但是至少需要使用sudo安装驱动程序。 - Bruno Laporais Pereira

2

您可以在/usr/lib/x86_64-linux-gnu目录中创建符号链接。我是通过以下方式找到它的:

$ whereis libcudart
libcudart: /usr/lib/x86_64-linux-gnu/libcudart.so /usr/share/man/man7/libcudart.7.gz

在这个文件夹里,你可以找到那些CUDA库的其他版本。然后创建符号链接,就像这样。你要链接的特定版本可能会略有不同。

$ sudo ln -s libcublas.so.10.2.1.243 libcublas.so.11
$ sudo ln -s libcublasLt.so.10.2.1.243 libcublasLt.so.11
$ sudo ln -s libcusolver.so.10.2.0.243 libcusolver.so.11
$ sudo ln -s libcusparse.so.10.3.0.243 libcusparse.so.11

现在你的GPU应该被检测到了。

import tensorflow as tf
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-12-07 17:07:26.914296: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-07 17:07:26.950731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.029687: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.030421: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.325218: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.325642: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.326022: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.326408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 9280 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:06:00.0, compute capability: 8.6
True

这种方法有效是因为这些cuda库非常相似,以至于NVIDIA经常使用符号链接来构建它们。如果tensorflow正在寻找libcublas.so.11,您可以创建一个名为该名称的文件,它只指向已安装的另一个版本的libcublas。


它还修复了“无法加载库libcublasLt.so.12”的错误。错误:libcublasLt.so.12:无法打开共享对象文件:没有那个文件或目录。非常感谢。 - Gelberth Amarillo Rojas
@GelberthAmarilloRojas 在我使用tensorflow 2.13和cuda 11.8时无法工作。 - Nick Jagiella
1
@Nick Jagiella 这个方法可以解决与库相关的警告问题,你首先找到你拥有的库,然后创建符号链接连接它们。 - Gelberth Amarillo Rojas

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接