Tensorflow 2无法在GPU上运行

Question

Tensorflow 2无法在GPU上运行

3

我正在使用tensorflow-gpu 2.0.0版本，并已安装gpu驱动程序、CUDA和cuDNN（CUDA版本为10.1.243_426，cuDNN版本为v7.6.5.32，我正在使用Windows！）。

当我编译模型或运行时：

from tensorflow.python.client import device_lib 
print(device_lib.list_local_devices())

它将打印出：

2020-01-12 19:56:50.961755: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-01-12 19:56:50.974003: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-01-12 19:56:51.628299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce MX150 major: 6 minor: 1 memoryClockRate(GHz): 1.5315
pciBusID: 0000:01:00.0
2020-01-12 19:56:51.636256: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-01-12 19:56:51.642106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-01-12 19:56:52.386608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-12 19:56:52.393162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-01-12 19:56:52.396516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-01-12 19:56:52.400632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 1356 MB memory) -> physical GPU (device: 0, na
me: GeForce MX150, pci bus id: 0000:01:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1008745203605650029
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 1422723891
locality {
  bus_id: 1
  links {
  }
}
incarnation: 18036547379173389852
physical_device_desc: "device: 0, name: GeForce MX150, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

这句话的意思是tensorflow一定会使用gpu设备! 但是当我运行我的模型时，我发现gpu没有任何作用！

然而你可以看到部分gpu内存正在被使用，而且我能看到一个gpu活动，那是我的程序!!

怎么回事？我做错了什么吗？我已经搜索了很多，并检查了SO中的很多问题，但没有人问过这样的问题！

- Hamidreza

你能展示一下你正在运行的模型代码吗？ - Shanqing Cai

2个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Timbus Calin · Answer 1

摘自 TensorFlow 的官方文档。

 import tensorflow as tf 
    tf.debugging.set_log_device_placement(True)

    # Create some tensors
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)

    print(c)

如果您运行上面的代码（如果您的GPU对TensorFlow可见，则应在GPU上运行），则您的训练将在TensorFlow上运行。

您必须看到像这样的输出：

执行设备中的op MatMul /job:localhost/replica:0/task:0/device:GPU:0 tf.Tensor（[[22. 28.] [49. 64.]]，shape =（2,2），dtype = float32）

此外，您可以看到任务管理器中专用GPU内存使用量的激增 ->似乎正在使用您的GPU，但为了确保，请运行上面的代码。

- MarkD · Answer 2

我还注意到Windows任务管理器不能用于监控GPU（双）活动。尝试安装TechPowerUp GPU-Z。（我正在运行双NVidia卡）。这可以监视CPU和GPU的活动、功率和温度。