Tensorflow:无法加载动态库'libcusolver.so.11'; dlerror: libcusolver.so.11:无法打开共享对象文件:没有那个文件或目录。

4

我已经尝试了很多天来在我的GPU上运行tensorflow,但一直无法完成。

我知道有很多类似问题的解决方案,但我尝试了所有我找到的方法都没有奏效,这就是为什么我写这个问题:

如何安装libcusolver.so.11

https://dev59.com/EFIG5IYBdhLWcg3wtz9F#67642774

我已经为Nvidia GeForce RTX 3090安装了460.106.00驱动程序和cuda 11.2:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00   Driver Version: 460.106.00   CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    On   | 00000000:08:00.0  On |                  N/A |
| 33%   26C    P8    22W / 350W |    282MiB / 24260MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1264      G   /usr/lib/xorg/Xorg                 59MiB |
|    0   N/A  N/A      3349      G   /usr/lib/xorg/Xorg                124MiB |
|    0   N/A  N/A      3508      G   /usr/bin/gnome-shell               77MiB |
|    0   N/A  N/A      6384      G   /usr/lib/firefox/firefox            4MiB |
+-----------------------------------------------------------------------------+

cudnn是什么:

cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 1

还有GCC编译器:

gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

我还将LD_LIBRARY_PATH添加到了./bashrc中。

# Nvidia cuda toolkit
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64${LD_LIBRARY_PATH+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

我尝试过几个tensorflow和tensorflow-gpu的版本,从2.4到2.7,但每个版本都会出现以下问题:

2022-01-24 21:28:43.206834: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

或者
2022-01-24 21:28:44.087779: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087827: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087858: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087891: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087921: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087947: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087975: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory

提前感谢,我不知道还能尝试什么...


顺便说一下,我已经尝试了使用PyTorch,并且它能够正确地找到GPU print(torch.cuda.get_device_name(0)) - David Serrano
感谢您的快速回答,@Robert Crovella。那么,您会如何解决这个问题?通过升级或降级CUDA或TensorFlow吗? - David Serrano
CUDA 11.2 应该为位于 /usr/local/cuda-11.2/lib64libcudart.so.11.0 提供符号链接。如果您在那里找不到该符号链接,我会说您的 CUDA 安装已损坏。 - Robert Crovella
我已经搜索过了,它就在那里... - David Serrano
那么你的TF没有找到它,可能是因为你正在运行一个Python环境,该环境没有捕获你设置的那些环境变量(LD_LIBRARY_PATH)。 - Robert Crovella
好的... Python环境能够看到在~/.bashrc中定义的变量吗? - David Serrano
2个回答

1

0
尝试了很多方法后,我创建了一个新的conda环境并安装了tensorflow-gpu,因为我不关心TF版本。
conda install tensorflow-gpu -c anaconda

它安装了以下所有软件包:

package                    |            build
    ---------------------------|-----------------
    _tflow_select-2.1.0        |              gpu           2 KB  anaconda
    absl-py-0.10.0             |           py38_0         170 KB  anaconda
    aiohttp-3.6.3              |   py38h7b6447c_0         622 KB  anaconda
    astunparse-1.6.3           |             py_0          17 KB  anaconda
    async-timeout-3.0.1        |           py38_0          12 KB  anaconda
    attrs-20.2.0               |             py_0          41 KB  anaconda
    blas-1.0                   |              mkl           6 KB  anaconda
    blinker-1.4                |           py38_0          21 KB  anaconda
    brotlipy-0.7.0             |py38h7b6447c_1000         349 KB  anaconda
    c-ares-1.16.1              |       h7b6447c_0         112 KB  anaconda
    ca-certificates-2020.10.14 |                0         128 KB  anaconda
    cachetools-4.1.1           |             py_0          12 KB  anaconda
    certifi-2020.6.20          |           py38_0         160 KB  anaconda
    cffi-1.14.0                |   py38h2e261b9_0         228 KB  anaconda
    chardet-3.0.4              |        py38_1003         170 KB  anaconda
    click-7.1.2                |             py_0          67 KB  anaconda
    cryptography-3.1.1         |   py38h1ba5d50_0         618 KB  anaconda
    cudatoolkit-10.1.243       |       h6bb024c_0       513.2 MB  anaconda
    cudnn-7.6.5                |       cuda10.1_0       250.6 MB  anaconda
    cupti-10.1.168             |                0         1.7 MB  anaconda
    gast-0.3.3                 |             py_0          14 KB  anaconda
    google-auth-1.22.1         |             py_0          62 KB  anaconda
    google-auth-oauthlib-0.4.1 |             py_2          21 KB  anaconda
    google-pasta-0.2.0         |             py_0          44 KB  anaconda
    grpcio-1.31.0              |   py38hf8bcb03_0         2.3 MB  anaconda
    h5py-2.10.0                |   py38hd6299e0_1         1.1 MB  anaconda
    hdf5-1.10.6                |       hb1b8bf9_0         4.8 MB  anaconda
    idna-2.10                  |             py_0          56 KB  anaconda
    importlib-metadata-2.0.0   |             py_1          35 KB  anaconda
    intel-openmp-2020.2        |              254         947 KB  anaconda
    keras-preprocessing-1.1.0  |             py_1          36 KB  anaconda
    libgfortran-ng-7.3.0       |       hdf63c60_0         1.3 MB  anaconda
    libprotobuf-3.13.0.1       |       hd408876_0         2.3 MB  anaconda
    markdown-3.3.2             |           py38_0         123 KB  anaconda
    mkl-2019.4                 |              243       204.1 MB  anaconda
    mkl-service-2.3.0          |   py38he904b0f_0          68 KB  anaconda
    mkl_fft-1.2.0              |   py38h23d657b_0         173 KB  anaconda
    mkl_random-1.1.0           |   py38h962f231_0         398 KB  anaconda
    multidict-4.7.6            |   py38h7b6447c_1          72 KB  anaconda
    numpy-1.19.1               |   py38hbc911f0_0          20 KB  anaconda
    numpy-base-1.19.1          |   py38hfa32c7d_0         5.3 MB  anaconda
    oauthlib-3.1.0             |             py_0          88 KB  anaconda
    openssl-1.1.1h             |       h7b6447c_0         3.8 MB  anaconda
    opt_einsum-3.1.0           |             py_0          54 KB  anaconda
    protobuf-3.13.0.1          |   py38he6710b0_1         702 KB  anaconda
    pyasn1-0.4.8               |             py_0          58 KB  anaconda
    pyasn1-modules-0.2.8       |             py_0          67 KB  anaconda
    pycparser-2.20             |             py_2          94 KB  anaconda
    pyjwt-1.7.1                |           py38_0          32 KB  anaconda
    pyopenssl-19.1.0           |             py_1          47 KB  anaconda
    pysocks-1.7.1              |           py38_0          27 KB  anaconda
    requests-2.24.0            |             py_0          54 KB  anaconda
    requests-oauthlib-1.3.0    |             py_0          22 KB  anaconda
    rsa-4.6                    |             py_0          26 KB  anaconda
    scipy-1.5.2                |   py38h0b6359f_0        18.7 MB  anaconda
    six-1.15.0                 |             py_0          13 KB  anaconda
    tensorboard-2.2.1          |     pyh532a8cf_0         2.5 MB  anaconda
    tensorboard-plugin-wit-1.6.0|             py_0         663 KB  anaconda
    tensorflow-2.2.0           |gpu_py38hb782248_0           4 KB  anaconda
    tensorflow-base-2.2.0      |gpu_py38h83e3d50_0       421.3 MB  anaconda
    tensorflow-estimator-2.2.0 |     pyh208ff02_0         276 KB  anaconda
    tensorflow-gpu-2.2.0       |       h0d30ee6_0           2 KB  anaconda
    termcolor-1.1.0            |           py38_1           8 KB  anaconda
    urllib3-1.25.11            |             py_0          93 KB  anaconda
    werkzeug-1.0.1             |             py_0         243 KB  anaconda
    wrapt-1.12.1               |   py38h7b6447c_1          50 KB  anaconda
    yarl-1.6.2                 |   py38h7b6447c_0         142 KB  anaconda
    zipp-3.3.1                 |             py_0          11 KB  anaconda
    ------------------------------------------------------------
                                           Total:        1.41 GB


包括cudatoolkit和cudnn...
之后,我不知道为什么,TF检测到了Nvidia显卡:
2022-01-25 09:37:52.865587: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-01-25 09:37:52.902796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-25 09:37:52.903487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.69GiB deviceMemoryBandwidth: 871.81GiB/s
2022-01-25 09:37:52.903637: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-01-25 09:37:52.904633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-01-25 09:37:52.905878: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-01-25 09:37:52.906023: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-01-25 09:37:52.907115: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-01-25 09:37:52.907719: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-01-25 09:37:52.910042: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-01-25 09:37:52.910137: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-25 09:37:52.911078: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-25 09:37:52.911707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
Num GPUs Available:  1

Prcess finished with exit code 0

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接