Keras可以检测到我的GPU,但在训练神经网络时没有使用它。

7

我的GPU没有被Keras/TensorFlow使用。

为了尝试让我的GPU与tensorflow合作,我通过pip安装了tensorflow-gpu(我在Windows上使用Anaconda)

我有nvidia 1080ti

print(tf.test.is_gpu_available())

True

print(tf.config.experimental.list_physical_devices())

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), 
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

I tied

physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

但它并没有帮助到

sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))
print(sess)

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1

<tensorflow.python.client.session.Session object at 0x000001A2A3BBACF8>

tf只发出了警告:

W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows 

整个日志:

2019-10-18 20:06:26.094049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2019-10-18 20:06:35.078225: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-10-18 20:06:35.090832: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-10-18 20:06:35.180744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2019-10-18 20:06:35.185505: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-18 20:06:35.189328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-18 20:06:35.898592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-18 20:06:35.901683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-10-18 20:06:35.904235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-10-18 20:06:35.906687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-18 20:06:38.694481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2019-10-18 20:06:38.700482: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-18 20:06:38.704020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
[I 20:06:47.324 NotebookApp] Saving file at /Untitled.ipynb
2019-10-18 20:07:22.227110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2019-10-18 20:07:22.246012: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-18 20:07:22.261643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-18 20:07:22.272150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-18 20:07:22.275457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-10-18 20:07:22.277980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-10-18 20:07:22.316260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
2019-10-18 20:07:32.986802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2019-10-18 20:07:32.990509: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-18 20:07:32.993763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-18 20:07:32.995570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-18 20:07:32.997920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-10-18 20:07:32.999435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-10-18 20:07:33.001380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-18 20:07:36.048204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-10-18 20:07:37.971703: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2019-10-18 20:07:38.576861: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll

我尝试使用pip重新安装了tensorflow-gpu

为什么我认为GPU无法工作?- 因为我的Python内核使用CPU的99%、RAM的99%,有时GPU使用率约为7%,但大部分时间为0。我使用自定义数据生成器,但现在它只选择批次并调整它们的大小(skimage.io.resize)。1个epoch需要约44秒。此外,还有奇怪的行为,在每隔约10个样本和最后一个样本(37/38)时会随机冻结(约10-15秒)。

编辑:

我在这里发布了我的自定义数据生成器。

train_gen = DataGenerator(x = x_train,
                              y = y_train,
                              batch_size = 128,
                              target_shape = (100, 100, 3), 
                              sample_std = False,
                              feature_std = False,
                              proj_parameters = None,
                              blur_parameters = None,
                              nois_parameters = None,
                              flip_parameters = None,
                              gamm_parameters = None)

验证相同

更新:

所以问题出在生成器上,但我该怎么修复它?
我只使用了skimage和numpy操作。


2
7%的GPU使用率并不意味着GPU没有被使用。 - Dr. Snoopy
这可能是Keras中的一个bug(请参见https://github.com/tensorflow/models/issues/7640)。不幸的是,目前似乎还没有解决方案。 - Hagbard
1个回答

4
记录显示GPU确实被使用了。您几乎肯定遇到了IO瓶颈:您的GPU正在处理CPU抛给它的任何内容,速度比CPU加载和预处理数据的速度要快得多。这在深度学习中非常常见,有一些方法可以解决这个问题。
如果我们不知道有关您的数据管道(批处理的字节大小、预处理步骤等)以及数据存储方式更多信息,则无法提供很多帮助。其中一个加速的典型方法是将数据存储为二进制格式(如TFRecords),以便CPU可以更快地加载它。请参见官方文档
编辑:我快速浏览了您的输入管道。问题很可能确实是IO:
  • 您也应该在GPU上运行预处理步骤,您使用的许多增强技术都在tf.image中实现。如果可以,请考虑使用Tensorflow 2.0,因为它包括Keras并且也有大量的辅助工具。
  • 查看tf.data.Dataset API,它有很多帮助程序可以在不同的线程中加载所有数据,这可以粗略地将过程加速与您拥有的核心数量相同。
  • 您应该将图像存储为TFRecords。如果输入图像比较小,则这可能会将加载速度加快一个数量级。
  • 您也可以尝试更大的批次大小,我想您的图像可能非常小。

1
我在这里发布了我的自定义数据生成器链接 - ibvfteh
train_gen = DataGenerator(x=x_train, y=y_train, batch_size=128, target_shape=(100, 100, 3), sample_std=False, feature_std=False, proj_parameters=None, blur_parameters=None, nois_parameters=None, flip_parameters=None, gamm_parameters=None)验证相同。 - ibvfteh
1
我编辑了我的回答,提供更充分的反馈。 - francoisr
有没有一种方法可以在图像上应用投影或仿射变换(类似于skimage中的warp)? - ibvfteh
1
@ibvfteh 在 tf.contrib.image 中有许多看起来相似的东西:https://www.tensorflow.org/versions/r1.14/api_docs/python/tf/contrib/image/ - francoisr

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接