如何让TensorFlow XLA知道CUDA路径

Question

如何让TensorFlow XLA知道CUDA路径

6

我通过以下命令安装了TensorFlow的夜间版本：pip install tf-nightly-gpu --prefix=/tf/install/path

当我尝试运行任何XLA示例时，TensorFlow会出现错误“无法找到libdevice dir。使用'.'编译ptx失败。将尝试让GPU驱动程序编译ptx。未找到：/usr/local/cuda-10.0/bin/ptxas not found”。

显然，TensorFlow无法找到我的CUDA路径。在我的系统中，CUDA安装在/cm/shared/apps/cuda/toolkit/10.0.130中。由于我没有从源代码构建TensorFlow，默认情况下，XLA会搜索/user/local/cuda-*文件夹。但由于我没有这个文件夹，它会发出错误。

目前我的解决方法是创建符号链接。我检查了tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc中的TensorFlow源代码。该文件中有一条注释：“//通过--xla_gpu_cuda_data_dir明确指定的CUDA位置具有最高优先级。”那么如何传递值给此标志？我尝试了以下两个环境变量，但都不起作用：

export XLA_FLAGS="--xla_gpu_cuda_data_dir=/cm/shared/apps/cuda10.0/toolkit/10.0.130/"
export TF_XLA_FLAGS="--xla_gpu_cuda_data_dir=/cm/shared/apps/cuda10.0/toolkit/10.0.130/"

那么如何使用标志 "--xla_gpu_cuda_data_dir" 呢？谢谢。

- silence_lamb

3个回答

1

这个问题有一个代码更改，但不清楚如何使用。在这里检查https://github.com/tensorflow/tensorflow/issues/23783

- Harry Yoo

1

这对我起作用了。

tensorflow                2.11.0          gpu_py310hf8ff8df_0  
ii  nvidia-dkms-525                 525.105.17-0ubuntu0.22.04.1             amd64        NVIDIA DKMS package
ii  nvidia-driver-525               525.105.17-0ubuntu0.22.04.1             amd64        NVIDIA driver metapackage
nvidia-cuda-toolkit not installed

nVidia T4 @GCE Ubu 22.04LTS 最小化

conda install -c nvidia cuda-nvcc

ln -s /path/to/conda-env/lib/libdevice.10.bc .

我无法让XLA_FLAGS起作用

2023-04-21 09:17:00.947644: F tensorflow/compiler/xla/parse_flags_from_env.cc:226] Unknown flags in XLA_FLAGS: -–xla_gpu_cuda_data_dir=/home/rac/fulltf2/fullcuda.env/lib 
 Perhaps you meant to specify these on the TF_XLA_FLAGS envvar?
Aborted (core dumped)

- Antti Rytsölä

在错误消息中，短划线似乎是不同的“-–”。也许这就是问题所在？ - John

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user14653986 · Accepted Answer

8

您可以在终端中运行export XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda。

- user14653986

你真是救命恩人。我花了10个小时寻找解决方案。 - Martin Sonesson

1

在使用 Linux 操作系统的 Python 代码中，可以使用以下命令：os.environ['XLA_FLAGS'] = '--xla_gpu_cuda_data_dir=/usr/lib/cuda/' - Tedo Vrbanec

2

/path/to/cuda 应该是什么？ - Logic1

如果您正在运行Linux/Unix，您可以将以下内容添加到您的~/.profile文件中，以便自动执行。 - RustyToms

/path/to/cuda是包含nvvm/libdevice/libdevice.10.bc的目录。对于conda环境，您可以使用export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX。 - John