PyTorch OSError:[Errno 28] 设备上没有剩余空间

3

我正在使用一个Ubuntu 18的Docker容器。

$cat /etc/lsb-release

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"

当我尝试使用torchvision训练resnext101模型时,出现了以下错误。
Downloading: "https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth" to /home/vmuser/.cache/torch/hub/checkpoints/resnext101_32x8d-8ba56ff5.pth
  0%|                                                          | 0.00/340M [00:00<?, ?B/s]
Traceback (most recent call last):
  File "train_attn_best_config.py", line 377, in <module>
    tabct = TabCT(cnn = model, fc_dim = fd, attn_filters = af, n_attn_layers = nal).to(gpu)
  File "train_attn_best_config.py", line 219, in __init__
    self.ct_cnn = cnn_dict[cnn](pretrained = True)
  File "/home/vmuser/anaconda3/envs/pulmo/lib/python3.7/site-packages/torchvision/models/resnet.py", line 317, in resnext101_32x8d
    pretrained, progress, **kwargs)
  File "/home/vmuser/anaconda3/envs/pulmo/lib/python3.7/site-packages/torchvision/models/resnet.py", line 227, in _resnet
    progress=progress)
  File "/home/vmuser/anaconda3/envs/pulmo/lib/python3.7/site-packages/torch/hub.py", line 481, in load_state_dict_from_url
    download_url_to_file(url, cached_file, hash_prefix, progress=progress)
  File "/home/vmuser/anaconda3/envs/pulmo/lib/python3.7/site-packages/torch/hub.py", line 404, in download_url_to_file
    f.write(buffer)
  File "/home/vmuser/anaconda3/envs/pulmo/lib/python3.7/tempfile.py", line 481, in func_wrapper
    return func(*args, **kwargs)
OSError: [Errno 28] No space left on device


当我运行df命令时,我得到这样的结果,其中一个tmpfs只有65 mb。我尝试运行export TMPDIR=/var/tmpexport TMPDIR=~/Data/tmp。 $df
Filesystem      1K-blocks       Used Available Use% Mounted on
overlay        1797272568 1705953392         0 100% /
tmpfs               65536          0     65536   0% /dev
tmpfs            98346264          0  98346264   0% /sys/fs/cgroup
/dev/sda6      1797272568 1705953392         0 100% /etc/hosts
shm                 65536          0     65536   0% /dev/shm
/dev/sdb1      1845816492 1362932848 389098592  78% /home/vmuser/Data
tmpfs            98346264         12  98346252   1% /proc/driver/nvidia
tmpfs            19669256      93256  19576000   1% /run/nvidia-persistenced/socket
udev             98318592          0  98318592   0% /dev/nvidia1
tmpfs            98346264          0  98346264   0% /proc/acpi
tmpfs            98346264          0  98346264   0% /proc/scsi
tmpfs            98346264          0  98346264   0% /sys/firmware

但错误仍然存在。

1个回答

0

这似乎是一个 shm 问题。
尝试使用 ipc=host 标志运行 Docker。

更多详情请参见此线程


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接