docker: 守护程序错误响应:无法选择具备能力的设备驱动程序“”:[[gpu]]。

尽管每个GPU都有大约20GB的vRAM,但docker无法使用以下命令运行。我该如何解决这个问题?
[20:08:28] jalal@echo:~/research/code$ docker run --shm-size 2GB -it --gpus all docurdt/heal
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled 


[20:08:20] jalal@echo:~/research/code$ nvidia-smi
Fri Apr  1 20:08:28 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 31%   41C    P8    23W / 350W |    301MiB / 24576MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:21:00.0 Off |                  N/A |
| 30%   39C    P8    18W / 350W |     14MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:4A:00.0 Off |                  N/A |
| 30%   32C    P8    23W / 350W |     14MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:4B:00.0 Off |                  N/A |
| 30%   40C    P8    18W / 350W |     14MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

我也有:

$ uname -a
Linux echo 5.4.0-99-generic #112-Ubuntu SMP Thu Feb 3 13:50:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
LSB Version:    core-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:    20.04
Codename:   focal

$ docker -v
Docker version 20.10.7, build 20.10.7-0ubuntu5~20.04.2

另外,

$ df -h | grep /dev/shm
tmpfs                                126G  199M  126G   1% /dev/shm

并且

enter image description here

并且

$  cat /boot/config-$(uname -r) | grep -i seccomp
CONFIG_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y

并且

[20:33:17] (dpcc) jalal@echo:~$ lspci -vv | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1) (prog-if 00 [VGA controller])
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
01:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
21:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1) (prog-if 00 [VGA controller])
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
21:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
4a:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1) (prog-if 00 [VGA controller])
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
4a:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
4b:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1) (prog-if 00 [VGA controller])
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
4b:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
2个回答

  1. $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

  2. $ sudo apt-get update

  3. $ sudo apt-get install -y nvidia-docker2

  4. $ sudo systemctl restart docker

  5. $ docker run --shm-size 2GB -it --gpus all docurdt/heal (base) root@9f66ed7b7c1b:/Workspace#

非常感谢 grym 提供的链接


除了步骤1之外,上述提到的所有步骤对我都有效。有关步骤1的更多详细信息,请参考此链接:https://nvidia.github.io/nvidia-docker/。 另外,在步骤3中,您可能需要执行sudo apt-get install -y nvidia-container-toolkit - Dan