有人针对alpine-cuda提供了解决方案:
https://arto.s3.amazonaws.com/notes/cuda
Drivers
https://developer.nvidia.com/vulkan-driver
$ lsmod | fgrep nvidia
$ nvidia-smi
Driver Installation
https://us.download.nvidia.com/XFree86/Linux-x86_64/390.77/README/
https://github.com/NVIDIA/nvidia-installer
Driver Installation on Alpine Linux
https://github.com/sgerrand/alpine-pkg-glibc
https://github.com/sgerrand/alpine-pkg-glibc/releases
https://wiki.alpinelinux.org/wiki/Running_glibc_programs
$ apk add sudo bash ca-certificates wget xz make gcc linux-headers
$ wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-glibc/master/sgerrand.rsa.pub
$ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.27-r0/glibc-2.27-r0.apk
$ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.27-r0/glibc-bin-2.27-r0.apk
$ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.27-r0/glibc-dev-2.27-r0.apk
$ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.27-r0/glibc-i18n-2.27-r0.apk
$ apk add glibc-2.27-r0.apk glibc-bin-2.27-r0.apk glibc-dev-2.27-r0.apk glibc-i18n-2.27-r0.apk
$ /usr/glibc-compat/bin/localedef -i en_US -f UTF-8 en_US.UTF-8
$ bash NVIDIA-Linux-x86_64-390.77.run --check
$ bash NVIDIA-Linux-x86_64-390.77.run --extract-only
$ cd NVIDIA-Linux-x86_64-390.77 && ./nvidia-installer
Driver Uninstallation
$ nvidia-uninstall
Driver Troubleshooting
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 390.77NVIDIA-Linux-x86_64-390.77.run: line 998: /tmp/makeself.XXX/xz: No such file or directory\nExtraction failed.
$ apk add xz # Alpine Linux
bash: ./nvidia-installer: No such file or directory
Install the glibc compatibility layer package for Alpine Linux.
ERROR: You do not appear to have libc header files installed on your system. Please install your distribution's libc development package.
$ apk add musl-dev # Alpine Linux
ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured
$ apk add linux-vanilla-dev # Alpine Linux
ERROR: Failed to execute `/sbin/ldconfig`: The installer has encountered the following error during installation: 'Failed to execute `/sbin/ldconfig`'. Would you like to continue installation anyway?
Continue installation.
Toolkit
https://developer.nvidia.com/cuda-toolkit
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/
Toolkit Download
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal
$ wget -c https://developer.nvidia.com/compute/cuda/9.2/Prod2/local_installers/cuda_9.2.148_396.37_linux
Toolkit Installation
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
Toolkit Installation on Alpine Linux
$ apk add sudo bash
$ sudo bash cuda_9.2.148_396.37_linux
# You are attempting to install on an unsupported configuration. Do you wish to continue? y
# Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.37? y
# Do you want to install the OpenGL libraries? y
# Do you want to run nvidia-xconfig? n
# Install the CUDA 9.2 Toolkit? y
# Enter Toolkit Location: /opt/cuda-9.2
# Do you want to install a symbolic link at /usr/local/cuda? y
# Install the CUDA 9.2 Samples? y
# Enter CUDA Samples Location: /opt/cuda-9.2/samples
$ sudo ln -s cuda-9.2 /opt/cuda
$ export PATH="/opt/cuda/bin:$PATH"
Toolkit Uninstallation
$ sudo /opt/cuda-9.2/bin/uninstall_cuda_9.2.pl
Toolkit Troubleshooting
Cannot find termcap: Can't find a valid termcap file at /usr/share/perl5/core_perl/Term/ReadLine.pm line 377.
$ export PERL_RL="Perl o=0"
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
$ apk add g++ # Alpine Linux
cicc: Relink `/usr/lib/libgcc_s.so.1' with `/usr/glibc-compat/lib/libc.so.6' for IFUNC symbol `memset'
https://github.com/sgerrand/alpine-pkg-glibc/issues/58
$ scp /lib/x86_64-linux-gnu/libgcc_s.so.1 root@alpine:/usr/glibc-compat/lib/libgcc_s.so.1
$ sudo /usr/glibc-compat/sbin/ldconfig /usr/glibc-compat/lib /lib /usr/lib
Compiler
https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/
$ nvcc -V
现在,当您将工作负载放入某些单独的操作系统映像中(例如chroot或容器)时,您还必须在该映像中安装相同的驱动程序包版本。使用容器或chroot的主要原因之一是隔离和解耦应用程序与主机操作系统(因此您不再需要将它们适配到主机操作系统中并独立进行升级,甚至可以拥有独立于主机操作系统的容器映像),但现在这个原因立即被取消了。主机和工作负载需要完全匹配。
简而言之:如果您想要一个CUDA工作负载,则主机操作系统以及工作负载映像(容器,chroot等)都需要支持它,并且它们都需要安装相同的驱动程序版本。否则就像玩俄罗斯轮盘一样危险。
由于有人提到了“nvidia-docker”。这会破坏Docker最初用于的安全隔离功能。(只需查看源代码,实际上可在GitHub上找到)。它只是一个更好的chroot。但是,主机和Docker映像仍然需要安装相同的驱动程序堆栈版本。
最后,我想问一下,您在这里的实际用例是什么。
警告:这可能适用于在完全不重要的家用计算机上玩游戏,但对于任何需要稳定性和安全性的专业事务都不适合。如果您受到某些数据安全/隐私法规(如GDPO)的约束,请远离此类驱动程序-您无法遵守这些专有驱动程序的法规。存在法律风险。
--mtx
补充说明:为什么专有内核驱动程序从未可靠地工作?简单回答:Linux内核从未为此而设计,因此不支持。
较长的回答:内核模块不是在某个隔离环境中执行的外部程序(例如用户空间程序)-它们(根据定义)是内核的组成部分,只有在需要时才会被惰性加载。(它们甚至不像共享库/DLL)。这意味着它们必须在二进制级别上完全适合您正在运行的内核的实际构建。编译内核时,有许多配置选项会以微妙的方式影响实际的内部二进制布局,例如启用/禁用某些功能可能会更改某些数据结构的布局,特定于CPU的优化可能会更改数据结构、调用约定、锁定机制等等。澄清一下,这只是驱动程序,不是CUDA。那是另一回事。
事实上,这比预期的要容易得多。我只是没有完全理解nvidia-docker项目的进展情况以及它是如何工作的。
基本上,从nvidia-docker项目下载并安装最新版本的nvidia-docker即可。
https://github.com/NVIDIA/nvidia-docker/releases
然后创建一个alpine Linux的Dockerfile。FROM alpine:3.5
LABEL com.nvidia.volumes.needed="nvidia_driver"
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
RUN /bin/sh
开始构建。
docker build -t alpine-nvidia
运行
nvidia-docker run -ti --rm alpine-nvidia
nvidia-docker
会使用额外的参数调用docker cli。nvidia-smi
文件,但它实际上并没有运行(我猜二进制文件没有为alpine编译)。这不是一个问题吗? - ldirer