Tensorflow: 尝试分配3.90GiB内存时内存不足。调用者表示这不是一个失败。

Question

Tensorflow: 尝试分配3.90GiB内存时内存不足。调用者表示这不是一个失败。

16

我有一个问题不太理解。

Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.90GiB. 
The caller indicates that this is not a failure, 
but may mean that there could be performance gains if more memory is available.

这句话是什么意思？

我已经阅读了源代码。但由于我的能力有限，无法理解。
GPU的内存大小为6GB，我使用tfprof分析得到的内存使用结果约为14GB。这超出了GPU的内存大小。 这句话是在说明TensorFlow是分配CPU的内存还是使用了优秀的关于GPU内存使用的算法。

The version of tensorflow that I use is 1.2.

GPU的信息如下：

名称：GeForce GTX TITAN Z
主版本号：3，次版本号：5，内存时钟频率（GHz）0.8755
总内存：5.94GiB
可用内存：5.87GiB

我的代码：

#!/usr/bin/python3.4

import tensorflow as tf
import tensorflow.examples.tutorials.mnist.input_data as input_data
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0' 


mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
sess = tf.InteractiveSession(config=tf.ConfigProto(log_device_placement=True))
#sess = tf.InteractiveSession()
def weight_variable(shape):
    init = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(init)

def bias_variable(shape):
    init = tf.constant(0.1, shape=shape)
    return tf.Variable(init)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])


W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)


W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)


W_f1 = weight_variable([7*7*64, 1024])
b_f1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_f1) + b_f1)


keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_f2 = weight_variable([1024, 10])
b_f2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_f2) + b_f2)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


test_images = tf.placeholder(tf.float32, [None, 784])
test_labels = tf.placeholder(tf.float32, [None, 10])


tf.global_variables_initializer().run()

run_metadata = tf.RunMetadata()


for i in range(100):
    batch = mnist.train.next_batch(10000)
    if (i%10 == 0):  
        train_accurancy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob : 1.0})
        print("step %d, traning accurancy %g" % (i, train_accurancy))
    sess.run(train_step, feed_dict={x: batch[0], y_: batch[1], keep_prob : 0.5}, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE), run_metadata=run_metadata)

tf.contrib.tfprof.model_analyzer.print_model_analysis(
    tf.get_default_graph(),
    run_meta=run_metadata,
    tfprof_options=tf.contrib.tfprof.model_analyzer.PRINT_ALL_TIMING_MEMORY)

test_images = mnist.test.images[0:300, :]
test_labels = mnist.test.labels[0:300, :]
print("test accuracy %g" % accuracy.eval({x: test_images, y_: test_labels, keep_prob: 1.0}))

警告：

2017-08-10 21:37:44.589635: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.90GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-08-10 21:37:46.208897: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.61GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

tfprof 的结果：

==================Model Analysis Report======================
_TFProfRoot (0B/14854.97MB, 0us/7.00ms)

- qihao

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Engine · Accepted Answer

4

你正在使用GPU，并且你的batchSize是1000，对于10个类别来说这太多了！将batch size减小到10到20，并将范围增加到10e4甚至10e3。这个问题是众所周知的（known）。如果你一定要使用10000作为batch size，告诉Tensorflow使用CPU：

tf.device('/cpu:0')

- Engine

1

它可以无错误地运行，这让我感到困惑。我不知道为什么。tfprof的结果是14854.97MB，超出了我的GPU内存大小。 - qihao

你知道当GPU的内存耗尽时，TensorFlow会怎么做吗？它会使用CPU的内存和计算资源。 - qihao

@qihao 很抱歉让你等了这么久。很抱歉，我找到的唯一解决方法是将批处理变小，对于你的MNIST数据集，你肯定可以使用较小的批处理大小。这是一个错误，据我所知，他们还没有解决它。对于更大的数据集，你肯定需要使用GPU。 - Engine

嗨，我也遇到了与“Allocator（GPU_0_bfc）尝试使用freed_by_count = 0分配195.25MiB的内存时耗尽内存”的相同错误。调用者指示这不是故障，但这可能意味着如果有更多内存可用，则可能会获得性能提升。但是模型仍在正常训练，每个步骤中的损失都在减少。我的批处理大小为16。由于警告中提到它不是故障，并且如果使用更多GPU RAM，则会增加性能，所以这不是任何问题，对吗？是否可以继续训练？ - Tejaswini