即使GPU可用，.predict()仅在CPU上运行

Question

即使GPU可用，.predict()仅在CPU上运行

14

我使用了这个脚本来训练模型并进行预测，在配有启用的GPU的计算机上运行，但似乎在预测阶段仅使用了CPU。

在.predict()部分期间，我看到的设备放置日志如下：

2020-09-01 06:08:19.085400: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RangeDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.085617: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RepeatDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.089558: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op MapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.090003: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op PrefetchDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097064: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op FlatMapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097647: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op TensorDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097802: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RepeatDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097957: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ZipDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.101284: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ParallelMapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.101865: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ModelDataset in device /job:localhost/replica:0/task:0/device:CPU:0

即使我运行：

print(tf.config.experimental.list_physical_devices('GPU'))

我收到：

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')]

我使用的代码可以在这里找到。完整的输出日志可以在这里查看。

更多信息：
Python版本：3.7.7
Tensorflow版本：2.1.0
GPU：Nvidia Tesla V100-PCIE-16GB
CPU：Intel Xeon Gold 5218 CPU @ 2.30GHz
RAM：394851272 KB
操作系统：Linux

- georgehdd

3

如果你将所有代码都放在 with tf.device("gpu:0"): 上下文中会发生什么？ - opyate

然后在预测中只有一个操作被放置在GPU上。只有这一行在GPU上执行：PrefetchDataset in device /job:localhost/replica:0/task:0/device:GPU:0，其余的操作仍在CPU上执行。 - georgehdd

有些操作只能在 CPU 上运行，特别是与数据加载相关的操作必须在 CPU 上进行，所以我在这里看不到任何问题。您的模型非常小，可能不会因使用 GPU 而获得性能提升。 - Dr. Snoopy

这种行为有没有任何文档记录？此外，这个模型需要大约250毫秒来进行单个预测 - 使用GPU会显著提高性能吗？ - georgehdd

这很难说，推理时间取决于问题中未提及的许多因素，包括模型、模型所需的计算量、批处理大小以及在系统 RAM 和 GPU 之间移动数据所需的时间。您并没有确凿证据表明 predict() 正在 CPU 上运行，您只是假设它比当前更快。 - Dr. Snoopy

如果您想了解如何创建操作符，可以在此处查看 https://www.tensorflow.org/guide/create_op ，正如您所看到的，操作符必须明确地分配设备并为该设备实现，而某些操作符无法在GPU上实现，尤其是数据加载方面。 - Dr. Snoopy

6个回答

2

你的预测函数正在使用GPU。我已经用你的代码在NVIDIA 1080 GTX上重新计算了时间，推理需要100毫秒。

要么重新启动系统，要么检查GPU是否被利用。

这是你的代码中表明推理在GPU上运行的行：

2020-09-01 06:19:15.885778: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op __inference_distributed_function_58022 in device /job:localhost/replica:0/task:0/device:GPU:0

- Anchal Gupta

但是这行日志在调用.predict()函数之前就被打印出来了，与预测本身无关。我的问题是，我如何使.predict()也在GPU上运行。 - georgehdd

2

您是否使用了正确的tensorflow包？卸载tensorflow并安装tensorflow-gpu可能会有所帮助。

有关文档，请参见：https://www.tensorflow.org/install/gpu

- Y.Ynot

4

截至TensorFlow 2.1，CPU和GPU已经整合在同一软件包中，不再分开。此外，如果未安装GPU，将完全无法检测到显卡。 - Timbus Calin

2

看起来你需要根据文档使用Distributed Strategy来进行操作。那么你的代码就会变成以下形式：

tf.debugging.set_log_device_placement(True)
strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
    model = keras.Sequential(
        [
            keras.layers.Flatten(input_shape=(28, 28)),
            keras.layers.Dense(128, activation='relu'),
            keras.layers.Dense(10)
        ]
    )
    model.compile(
        optimizer='adam', 
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), 
        metrics=['accuracy']
    )
    model.fit(train_images, train_labels, epochs=10)

    test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
    probability_model = tf.keras.Sequential(
        [model, tf.keras.layers.Softmax()]
    )
    probability_model.predict(test_images)

根据文档，使用多个GPU的最佳实践是使用tf.distribute.Strategy。

- gold_cy

我正在尝试在推理阶段只使用单个GPU。问题是推理仅在CPU上运行。训练和所有操作都可以完美地使用GPU。 - georgehdd

2

您能否从tensorflow.python中调用keras？

例如：

from tensorflow.python.keras.models import Sequential

此外，请检查CUDA和CuDNN版本。 CUDA和CuDNN版本必须与TensorFlow版本兼容。您可以从此处检查。由于您的TensorFlow版本为2.1，因此CUDA和CuDNN版本必须分别为10.1和7.6。

- dasmehdix

Cuda是10.1版本，兼容性良好。Tensorflow可以识别GPU，但在预测时会避免使用它。 - georgehdd

0

如果你有一个GPU，tf.test.is_gpu_available()应该会返回True。这段代码可以强制TensorFlow利用指定的设备：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

- Amin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rishit Dagli · Accepted Answer

既然您已经拥有GPU，我假设tf.test.is_gpu_available()返回值为True。您可以使用以下代码来强制TensorFlow使用指定的设备-

with tf.device('/gpu:0'):
    // GPU stuff

如果您想强制使用CPU来执行代码的某部分，则此方法同样适用 -

with tf.device('/cpu:0'):
    // CPU stuff

一个插件可能会在使用tf.device()时非常有用，你可以使用这个函数列出所有的设备-

def get_available_devices():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos]

get_available_devices()

尽管针对您提到的使用情况，我不能保证使用GPU会使推理速度更快。