我正在尝试在Tensorflow中使用GPU。我的Tensorflow版本是2.4.1
,我正在使用Cuda版本11.2。这是nvidia-smi
的输出。
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce MX110 Off | 00000000:01:00.0 Off | N/A |
| N/A 52C P0 N/A / N/A | 254MiB / 2004MiB | 8% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1151 G /usr/lib/xorg/Xorg 37MiB |
| 0 N/A N/A 1654 G /usr/lib/xorg/Xorg 136MiB |
| 0 N/A N/A 1830 G /usr/bin/gnome-shell 68MiB |
| 0 N/A N/A 5443 G /usr/lib/firefox/firefox 0MiB |
| 0 N/A N/A 5659 G /usr/lib/firefox/firefox 0MiB |
+-----------------------------------------------------------------------------+
我遇到了一个奇怪的问题。之前,当我尝试使用tf.config.list_physical_devices()
列出所有物理设备时,它识别出一个CPU和一个GPU。然后我尝试在GPU上进行简单的矩阵乘法,结果出现了错误:failed to synchronize cuda stream CUDA_LAUNCH_ERROR
(错误代码类似于这样,我忘记记录了)。但是,在另一个终端中再次尝试相同的操作后,它无法识别任何GPU。这次,列出物理设备的结果是:
>>> tf.config.list_physical_devices()
2021-04-11 18:56:47.504776: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-11 18:56:47.507646: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-11 18:56:47.534189: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2021-04-11 18:56:47.534233: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: debadri-HP-Laptop-15g-dr0xxx
2021-04-11 18:56:47.534244: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: debadri-HP-Laptop-15g-dr0xxx
2021-04-11 18:56:47.534356: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.39.0
2021-04-11 18:56:47.534393: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.39.0
2021-04-11 18:56:47.534404: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.39.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
我的操作系统是Ubuntu 20.04,Python版本为3.8.5和Tensorflow。如之前提到的,Tensorflow的版本是2.4.1,Cuda版本是11.2。我按照这些说明安装了Cuda。另外需要提供的一点信息是:当我导入tensorflow时,它显示以下输出:
import tensorflow as tf
2021-04-11 18:56:07.716683: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
我错过了什么?为什么它无法识别GPU,尽管之前能够识别?
我错过了什么?为什么它无法识别GPU,尽管之前能够识别?
sudo apt-get install nvidia-modprobe
后重启。谢谢。 - user11530462