tensorflow.python.framework.errors_impl.ResourceExhaustedError

4

我正在使用一个对象检测模块来对图片进行分类。我的规格如下:

  • 操作系统:Ubuntu 18.04 LTS
  • Python:3.6.7
  • VirtualEnv:版本号:16.4.3
  • virtualenv中的pip3版本:19.0.3
  • TensorFlow版本:1.13.1
  • Protoc版本:3.0.0-9

我正在使用Windows virtualenv和google-colab进行工作。这是我得到的错误信息:

python3 legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config


INFO:tensorflow:global step 1: loss = 18.5013 (48.934 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened.
  warnings.warn("Attempting to use a closed FileWriter. "
Traceback (most recent call last):
  File "legacy/train.py", line 184, in <module>
    tf.app.run()
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "legacy/train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "/home/priyank/venv/models-master/research/object_detection/legacy/trainer.py", line 416, in train
    saver=saver)
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 785, in train
    ignore_live_threads=ignore_live_threads)
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 832, in stop
    ignore_live_threads=ignore_live_threads)
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/home/priyank/venv/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run
    enqueue_callable()
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1257, in _single_operation_run
    self._call_tf_sessionrun(None, {}, [], target_list, None)
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
<b>tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[15,1,1755,2777,3] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
     [[{{node batch}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.</b>
1个回答

4
你可以尝试以下几种解决方法:
1. 如果你使用的图片分辨率非常高,可以尝试降低图片尺寸
2. 尝试减少批处理大小
3. 检查是否有其他进程正在占用内存
请分享你的配置文件。

这是配置文件的链接 - https://mega.nz/#!iiBiACAZ!lUW89dtFysAyPkfR-umDDx5eEWjM_AhV8GmK03opc8g - Priyank Vashiar
尝试将批处理大小(第143行)减小到8或4。如果它们都不起作用,请使用1。此外,如果可能的话,请使用GPU进行训练。如果您计划在CPU上进行训练(不建议,因为这需要很长时间),请不要运行任何其他程序,因为这会减慢计算速度并消耗内存。 - Jitesh Malipeddi
1
我已经完成了这个项目,但似乎CPU无法处理负载。我成功地在Google Colab上运行了该项目,并建议任何遇到同样问题的人在Google Colab上运行他们的代码,因为它是免费使用的。 - Priyank Vashiar

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接