在Python Docker镜像上使用GPU

6

我正在使用一个python:3.7.4-slim-buster的 Docker 镜像,但我不能更改它。我想知道如何在其中使用我的nvidia gpu

通常我使用的是tensorflow/tensorflow:1.14.0-gpu-py3镜像,在docker run命令中使用简单的--runtime=nvidia就可以正常运行,但现在我有这个限制。

我认为在此类型的镜像上不存在快捷方式,因此我正在按照以下指南https://towardsdatascience.com/how-to-properly-use-the-gpu-within-a-docker-container-4c699c78c6d1构建提供的 Dockerfile:

FROM python:3.7.4-slim-buster

RUN apt-get update && apt-get install -y build-essential
RUN apt-get --purge remove -y nvidia*
ADD ./Downloads/nvidia_installers /tmp/nvidia                             > Get the install files you used to install CUDA and the NVIDIA drivers on your host
RUN /tmp/nvidia/NVIDIA-Linux-x86_64-331.62.run -s -N --no-kernel-module   > Install the driver.
RUN rm -rf /tmp/selfgz7                                                   > For some reason the driver installer left temp files when used during a docker build (i dont have any explanation why) and the CUDA installer will fail if there still there so we delete them.
RUN /tmp/nvidia/cuda-linux64-rel-6.0.37-18176142.run -noprompt            > CUDA driver installer.
RUN /tmp/nvidia/cuda-samples-linux-6.0.37-18176142.run -noprompt -cudaprefix=/usr/local/cuda-6.0   > CUDA samples comment if you dont want them.
RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64         > Add CUDA library into your PATH
RUN touch /etc/ld.so.conf.d/cuda.conf                                     > Update the ld.so.conf.d directory
RUN rm -rf /temp/*  > Delete installer files.

但是它会引发一个错误:
ADD failed: stat /var/lib/docker/tmp/docker-builder080208872/Downloads/nvidia_installers: no such file or directory

我应该怎样调整设置,让Docker镜像能够轻松地识别我的GPU?


2
基于nvidia-docker镜像构建您的Dockerfile。您需要在主机上安装cuda驱动程序。 - sim
你不能更改 python:3.7.4-slim-buster,因为你需要这个特定的 Python 版本,对吧? - anemyte
@anemyte 是的,至少我不应该使用tensorflow/pytorch图像。最好维护这一个。为什么?你有想法吗? - Paolo Magnani
1
我有一个基于Ubuntu的Dockerfile,其中包含Python3.7和CUDA。我使用它创建了一个定制的Tensorflow镜像以满足我的需求。如果符合您的需求,我会将其发布为答案。 - anemyte
@sim 谢谢你。你能给我一个Dockerfile的例子,让我的镜像保持不变吗? - Paolo Magnani
1个回答

8

TensorFlow图像被拆分成几个“部分”Dockerfile。其中之一包含了TensorFlow在GPU上运行所需的所有依赖项。使用它,您可以轻松创建自定义镜像,只需要将默认的Python更改为您需要的版本即可。对我来说,这比将NVIDIA的东西引入Debian镜像(据我所知,CUDA和/或cuDNN不受官方支持)要容易得多。

以下是Dockerfile:

# TensorFlow image base written by TensorFlow authors.
# Source: https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/tools/dockerfiles/partials/ubuntu/nvidia.partial.Dockerfile
# -------------------------------------------------------------------------
ARG ARCH=
ARG CUDA=10.1
FROM nvidia/cuda${ARCH:+-$ARCH}:${CUDA}-base-ubuntu${UBUNTU_VERSION} as base
# ARCH and CUDA are specified again because the FROM directive resets ARGs
# (but their default value is retained if set previously)
ARG ARCH
ARG CUDA
ARG CUDNN=7.6.4.38-1
ARG CUDNN_MAJOR_VERSION=7
ARG LIB_DIR_PREFIX=x86_64
ARG LIBNVINFER=6.0.1-1
ARG LIBNVINFER_MAJOR_VERSION=6

# Needed for string substitution
SHELL ["/bin/bash", "-c"]
# Pick up some TF dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        cuda-command-line-tools-${CUDA/./-} \
        # There appears to be a regression in libcublas10=10.2.2.89-1 which
        # prevents cublas from initializing in TF. See
        # https://github.com/tensorflow/tensorflow/issues/9489#issuecomment-562394257
        libcublas10=10.2.1.243-1 \ 
        cuda-nvrtc-${CUDA/./-} \
        cuda-cufft-${CUDA/./-} \
        cuda-curand-${CUDA/./-} \
        cuda-cusolver-${CUDA/./-} \
        cuda-cusparse-${CUDA/./-} \
        curl \
        libcudnn7=${CUDNN}+cuda${CUDA} \
        libfreetype6-dev \
        libhdf5-serial-dev \
        libzmq3-dev \
        pkg-config \
        software-properties-common \
        unzip

# Install TensorRT if not building for PowerPC
RUN [[ "${ARCH}" = "ppc64le" ]] || { apt-get update && \
        apt-get install -y --no-install-recommends libnvinfer${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda${CUDA} \
        libnvinfer-plugin${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda${CUDA} \
        && apt-get clean \
        && rm -rf /var/lib/apt/lists/*; }

# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:$LD_LIBRARY_PATH

# Link the libcuda stub to the location where tensorflow is searching for it and reconfigure
# dynamic linker run-time bindings
RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 \
    && echo "/usr/local/cuda/lib64/stubs" > /etc/ld.so.conf.d/z-cuda-stubs.conf \
    && ldconfig
# -------------------------------------------------------------------------
#
# Custom part
FROM base
ARG PYTHON_VERSION=3.7

RUN apt-get update && apt-get install -y --no-install-recommends --no-install-suggests \
          python${PYTHON_VERSION} \
          python3-pip \
          python${PYTHON_VERSION}-dev \
# Change default python
    && cd /usr/bin \
    && ln -sf python${PYTHON_VERSION}         python3 \
    && ln -sf python${PYTHON_VERSION}m        python3m \
    && ln -sf python${PYTHON_VERSION}-config  python3-config \
    && ln -sf python${PYTHON_VERSION}m-config python3m-config \
    && ln -sf python3                         /usr/bin/python \
# Update pip and add common packages
    && python -m pip install --upgrade pip \
    && python -m pip install --upgrade \
        setuptools \
        wheel \
        six \
# Cleanup
    && apt-get clean \
    && rm -rf $HOME/.cache/pip


您可以从这里开始:更改Python版本为您需要的版本(并且在Ubuntu存储库中可用),添加软件包、代码等。

1
这真的对我的解决方案有所帮助。我唯一改变的是,我复制了官方 Python Dockerfile 代码来处理 Python 部分。请注意,如果其他利用 GPU 的库需要使用 TensorFlow Dockerfile 的“devel”版本(而不是上面的基本版本)才能正常运行,例如 Implicit 库。 - ZaxR

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接