Tensorflow 2.2 GPU - 应安装哪个cuDNN库?

7
我已成功安装了CUDA驱动程序、cuDNN库和tensorflow。但是当运行一个简单的导入tensorflow的测试程序时,我遇到了一个错误。这个错误似乎表明我安装了错误版本的cuDNN库。我希望得到一些帮助。如果需要降级cuDNN,我应该如何做?
Tensorflow版本:2.2 GPU 操作系统:Ubuntu 16.04.6 LTS(GNU / Linux 4.4.0-184-generic x86_64) nvcc -V 显示以下信息:
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

nvidia-smi 显示以下信息:

Fri Jun 12 17:16:38 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 450.36.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 980 Ti  Off  | 00000000:02:00.0 Off |                  N/A |
| 22%   27C    P8    17W / 250W |     74MiB /  6083MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1489      G   /usr/lib/xorg/Xorg                 71MiB |
+-----------------------------------------------------------------------------+

按照https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#download的说明,cuDNN已成功安装,但我认为我安装了11.0版本

程序尝试导入tensorflow (python 3.6)时出现错误消息。

2020-06-12 17:21:38.131160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:02:00.0 name: GeForce GTX 980 Ti computeCapability: 5.2
coreClock: 1.228GHz coreCount: 22 deviceMemorySize: 5.94GiB deviceMemoryBandwidth: 313.37GiB/s
2020-06-12 17:21:38.131384: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-06-12 17:21:38.131498: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory
2020-06-12 17:21:38.133367: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-12 17:21:38.133807: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-12 17:21:38.137813: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-12 17:21:38.137958: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory
2020-06-12 17:21:38.138063: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2020-06-12 17:21:38.138085: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-06-12 17:21:38.138114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-12 17:21:38.138131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-06-12 17:21:38.138152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 

更麻烦了:https://stackoverflow.com/questions/67931031/tensorflow-gpu-2-2-works-with-cuda-10-2-but-requires-cudnn-7-6-4-which-doesnt-h - Antti Rytsölä
1个回答

5
根据以下内容,对于tensorflow 2.2,您需要CUDA 10.1和cuDNN 7.4:

https://www.tensorflow.org/install/source_windows#tested_build_configurations

CUDA存档/旧版本:https://developer.nvidia.com/cuda-toolkit-archive cuDNN存档,需要注册nvidia账户才能访问:https://developer.nvidia.com/rdp/cudnn-archive 值得注意的是,在7.4版本中没有与10.1兼容的cuDNN,因此建议尝试7.5.0版本。安装cuDNN只需将下载的文件复制到安装CUDA的文件夹中(放在它们各自的文件夹中)。

在TF页面上,您提到写道cuDNN 7.6与CUDA 10.1相匹配,并且cuDNN存档页面列出了:cuDNN v7.6.5(2019年11月5日),适用于CUDA 10.1。 - Joysn

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接