Tensorflow崩溃并显示CUBLAS_STATUS_ALLOC_FAILED错误

Question

Tensorflow崩溃并显示CUBLAS_STATUS_ALLOC_FAILED错误

53

我正在Windows 10上运行tensorflow-gpu，使用一个简单的MINST神经网络程序。当它尝试运行时，遇到了一个CUBLAS_STATUS_ALLOC_FAILED错误。谷歌搜索没有找到任何相关信息。

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
         [[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

- Axiverse

10个回答

28

现在，会话配置中的“allow_growth”属性的位置似乎已经改变。这里有解释：https://www.tensorflow.org/tutorials/using_gpu

因此，目前您需要像这样设置它：

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

- Rafal Zajac

2

会话 = tf.Session(config=config, ...) ^ 语法错误：关键字参数后面跟着位置参数解决方案无效。 - Space Bear

3

无法在 TensorFlow 2.1 上运行：tf.version 为 '2.1.0'，模块 'tensorflow' 没有属性 'ConfigProto'。 - yeeking

@yee 对于 tensorflow-gpu 2.2.0 同样适用。 - Cadoiz

模块“tensorflow”没有属性“ConfigProto”。 - Stefan

21

tensorflow>=2.0

import tensorflow as tf
config = tf.compat.v1.ConfigProto(gpu_options = 
                         tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(session)

- Welcome_back

1

使用tf 2.1.0，Windows 10，16GB RAM，RTX 2070 Max Q 8GB进行工作，但我将该值更改为0.5。 - yeeking

在这台机器上也可以工作：https://www.userbenchmark.com/UserRun/30694804 但是它给了我一个警告：“The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.” 我认为这应该包含在答案中，但作者应该决定。 - Cadoiz

在2020年11月，我在Windows x64操作系统下使用Python 3.75、Cuda 10.1和TensorFlow 2.3，并且使用RTX 2080 Ti进行了开发工作。 - Contango

使用TensorFlow 2.4.0，在RTX 2060上，Windows 10下的分数为0.8的情况下，对我起作用。 - Rahat Zaman

对我来说起作用了，但最终没有使用太多的GPU内存（<< 0.8）。我使用@Anxifer提供的解决方案获得了更好的结果。TensorFlow 2.4.1。 - Kenneth Evans

10

我发现这个解决方案是有效的

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session

config = tf.ConfigProto(
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
    # device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

- Space Bear

无法在已测试的tensorflow 2.1和2.2上工作，并显示以下错误：AttributeError: module 'tensorflow' has no attribute 'ConfigProto'。 - Cadoiz

4

在Windows上，目前TensorFlow并没有像文档中所说的那样分配所有可用的内存，相反，您可以通过允许动态内存增长来解决此错误，具体方法如下：

tf.Session(config=tf.ConfigProto(allow_growth=True))

- Axiverse

4

似乎ConfiProto缺少了这个参数，因此会产生一个错误ValueError: Protocol message ConfigProto has no "allow_growth" field。 - Oleg Melnikov

可能仅适用于TF1，版本2.1和2.2会给我相同的错误，但Jai Mahesh（https://stackoverflow.com/users/11280106/jai-mahesh）的答案对我有用。答案链接：https://dev59.com/z1gR5IYBdhLWcg3wnORw#59558128 - Cadoiz

4

对我来说，这些解决方法都无效，因为tensorflow库的结构似乎已经发生了改变。对于 Tensorflow 2.0, 唯一有效的解决方法是在此页面上应用 限制GPU内存增长 的操作：https://www.tensorflow.org/guide/gpu

为了完整性和未来可扩展性，以下是来自文档的解决方案 - 我想对于某些人来说，更改 memory_limit 可能是必要的 - 在我的情况下，1 GB就足够了。

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

- carthurs

非常感谢。这是唯一有效的解决方案。如果您在复制此代码时还没有添加“import tensorflow as tf”，请不要忘记添加。 - Sam

2

在我的情况下，一个已停止的 Python 进程正在消耗内存。我通过任务管理器将其结束，并使一切恢复正常。

- winterlight

2

有点晚了，但这解决了我在tensorflow 2.4.0和gtx 980ti上的问题。在限制内存之前，我遇到了以下错误：

CUBLAS_STATUS_ALLOC_FAILED

我的解决方案是这段代码：

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])

我在这里找到了解决方案：https://www.tensorflow.org/guide/gpu

- Anxifer

2

Tensorflow 2.0 alpha

允许GPU内存增长可能会解决此问题。对于Tensorflow 2.0 alpha / nightly，您可以尝试两种方法来实现这一点。

1.) 允许GPU内存增长：

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth()

2.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4) # adjust this to the % of VRAM you 
                                                   # want to give to tensorflow.

我建议你尝试两种方法，看看是否有帮助。来源：https://www.tensorflow.org/alpha/guide/using_gpu

- kett

我认为你的意思是tf.config.gpu.set_per_process_memory_growth()。 - Axiverse

在<module>中 tf.config.gpu.set_per_process_memory_growth() AttributeError: 模块'tensorflow_core._api.v2.config'没有'gpu'属性 - seilgu

1

@seilgu，现在2.0版本已经不是alpha版了，可以使用以下代码来限制GPU内存：

tf.config.experimental.set_virtual_device_configuration(tf.config.experimental.list_physical_devices('GPU')[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])

。详见：https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth - pentavalentcarbon

2

对于Keras：

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

- Maverick Meerkat

无法在已测试的TensorFlow 2.1和2.2上工作，并显示以下错误：AttributeError: module 'tensorflow' has no attribute 'ConfigProto'。 - Cadoiz

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Snympi · Accepted Answer

对于TensorFlow 2.2，当遇到CUBLAS_STATUS_ALLOC_FAILED问题时，其他答案都无法解决。在https://www.tensorflow.org/guide/gpu上找到了一个解决方案：

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

在进行任何进一步的计算之前，我运行了这段代码，并发现先前产生CUBLAS错误的同一代码现在在同一会话中正常工作。上面的示例代码是一个具体的示例，它设置了多个物理GPU上的内存增长，同时也解决了内存扩展问题。