无法创建cudnn句柄：CUDNN状态内部错误

Question

无法创建cudnn句柄：CUDNN状态内部错误

algorithmcudnn

7

我正在尝试在Python 3中创建机器学习。但当我尝试编译我的代码时，在Cuda 10.0 / cuDNN 7.5.0中出现了以下错误，有人可以帮助我吗？

RTX 2080

我使用的软件版本是： Keras（2.2.4） tf-nightly-gpu（1.14.1.dev20190510）

错误信息如下： "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR"

代码错误: "tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above."

这是我的代码：

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(50, 50, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(1, activation='softmax'))

model.summary()

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.fit(x, y, epochs=1, batch_size=n_batch)

在使用GPU_0_bfc分配器时，尝试分配形状为[24946, 32, 48, 48]且类型为float的张量时，发生了OOM（内存不足）错误。

- 007fred

3个回答

2

分配GPU内存的问题

有两种可能的解决方案。

添加以下代码：

最初的回答

import tensorflow as tf
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
config = tf.ConfigProto(gpu_options=gpu_options)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

请查看此问题

您的NVIDIA驱动程序存在问题

如此处所述，您需要使用ODE驱动程序升级您的NVIDIA驱动程序。

请查看NVIDIA文档以获取驱动程序版本信息。

注：ODE驱动程序是专门为工作站和服务器设计的高性能驱动程序。

- venergiac

嗨，我遇到了这个错误（OP_REQUIRES failed at conv_ops.cc:484 : Resource exhausted: OOM when allocating tensor with shape[24946,32,48,48] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc），然后我添加了你的代码。 - 007fred

你用更小的神经网络进行测试了吗？ - venergiac

但是，我的RTX 2080可以运行（LSTM），但无法运行（Conv2D）。 - 007fred

我在Ubuntu服务器16.04上，不是Windows。 - 007fred

我正在使用CUDA 10.0，我已经了解到TensorFlow不支持CUDA 10.1。我的驱动程序版本是418.39 Ubuntu。 - 007fred

显示剩余5条评论

0

如果你正在使用Tensorflow 2.0，Roko的答案应该有效。

如果你想要设置确切的内存限制（例如1024MB或2GB等），还有另一种方法可以限制你的GPU内存使用。

使用以下代码：

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
  except RuntimeError as e:
    print(e)

这段代码将限制第一块GPU的内存使用量不超过1024MB。只需更改gpus和memory_limit的索引即可。

- starriet

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Roko Mijic · Accepted Answer

使用Tensorflow 2.0、CUDA 10.0和CUDNN 7.5，以下方法适用于我：

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

还有一些其他的答案（例如venergiac在这里提供的答案）使用过时的Tensorflow 1.x语法。如果您正在使用最新的Tensorflow，您需要使用我在此处提供的代码。

如果您遇到以下错误：

Physical devices cannot be modified after being initialized

如果将gpus = tf.config ...这几行代码直接放在导入tensorflow的位置下面，问题就能解决。

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)