TensorFlow或Python在循环中使用多个模型时是否存在内存清理问题？

Question

TensorFlow或Python在循环中使用多个模型时是否存在内存清理问题？

pythontensorflowmemory-leaksgarbage-collection

12

我正在开发一个使用tensorflow的模型，它需要占用很多内存。该模型会被迭代执行以处理给定的任务。

然而，随着时间的增加，整个过程开始消耗越来越多的内存，尽管它应该清理内存。这听起来像是我在迭代中保留了某个图形数据，但我几乎确定这些图形是干净分离的。

问题

我将代码简化为以下内容：

import tensorflow as tf
import numpy as np

reps = 30
for i in range(reps):
    with tf.Graph().as_default() as graph:
        with tf.Session(graph=graph) as sess:
            tf.constant(np.random.random((1000,1000,200,1)))

我有32GB的可用内存，在一台装有CPU Tensorflow 1.3 的ubuntu 17.04 上工作。大约在第25或27次迭代后，会出现以下错误消息：

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

每次迭代后给进程一些时间也没有改善：

import tensorflow as tf
import numpy as np
import time

reps = 30
for i in range(reps):
    with tf.Graph().as_default() as graph:
        with tf.Session(graph=graph) as sess:
            tf.constant(np.random.random((1000,1000,200,1)))
    time.sleep(1)

但是，如果我在每次重复后强制进行垃圾回收调用，它就可以工作：

import tensorflow as tf
import numpy as np
import gc

reps = 30
for i in range(reps):
    with tf.Graph().as_default() as graph:
        with tf.Session(graph=graph) as sess:
            tf.constant(np.random.random((1000,1000,200,1)))
    gc.collect()

问题

现在我想知道为什么我需要强制进行垃圾回收，即使tensorflow应该已经关闭了会话并取消引用图形对象。

回到我的原始模型，我尚不确定gc调用是否真的有帮助。内存使用量增长得非常剧烈，特别是当我要将模型持久化到磁盘时。

对于如何迭代处理大型模型是否有最佳实践？这是实际的内存问题吗？

感谢任何见解。

- jjs

相关链接：https://stackoverflow.com/questions/63411142/how-to-avoid-oom-errors-in-repeated-training-and-prediction-in-tensorflow（即使使用`gc.collect()`也不能总是解决问题）。 - bers

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- quine9997 · Answer 1

尝试使用tf.reset_default_graph()

在循环中构建图时Tensorflow内存泄漏

https://www.tensorflow.org/api_docs/python/tf/compat/v1/reset_default_graph