Pytorch 如何在没有 GPU 的情况下加载 CPU 和 CUDA 设备?

3
我发现了这个不错的Pytorch mobilenet代码,但我无法在CPU上运行。 https://github.com/rdroste/unisal 我是Pytorch的新手,不确定该怎么做。
在train.py模块的第174行设置了设备。
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

据我所知,这是正确的Pytorch相关内容。
我需要改变torch.load吗?我尝试了但没有成功。
class BaseModel(nn.Module):
    """Abstract model class with functionality to save and load weights"""

    def forward(self, *input):
        raise NotImplementedError

    def save_weights(self, directory, name):
        torch.save(self.state_dict(), directory / f'weights_{name}.pth')

    def load_weights(self, directory, name):
        self.load_state_dict(torch.load(directory / f'weights_{name}.pth'))

    def load_best_weights(self, directory):
        self.load_state_dict(torch.load(directory / f'weights_best.pth'))

    def load_epoch_checkpoint(self, directory, epoch):
        """Load state_dict from a Trainer checkpoint at a specific epoch"""
        chkpnt = torch.load(directory / f"chkpnt_epoch{epoch:04d}.pth")
        self.load_state_dict(chkpnt['model_state_dict'])

    def load_checkpoint(self, file):
        """Load state_dict from a specific Trainer checkpoint"""
        """Load """
        chkpnt = torch.load(file)
        self.load_state_dict(chkpnt['model_state_dict'])

    def load_last_chkpnt(self, directory):
        """Load state_dict from the last Trainer checkpoint"""
        last_chkpnt = sorted(list(directory.glob('chkpnt_epoch*.pth')))[-1]
        self.load_checkpoint(last_chkpnt)

我不明白。我在哪里告诉PyTorch没有GPU?

完整错误信息:

Traceback (most recent call last):
  File "run.py", line 99, in <module>
    fire.Fire()

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
    target=component.__name__)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/fire/core.py", line 675, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)

  File "run.py", line 95, in predict_examples
    example_folder, is_video, train_id=train_id, source=source)

  File "run.py", line 72, in predictions_from_folder
    folder_path, is_video, source=source, model_domain=model_domain)

  File "/home/b256/Data/saliency_models/unisal-master/unisal/train.py", line 871, in generate_predictions_from_path
    self.model.load_best_weights(self.train_dir)

  File "/home/b256/Data/saliency_models/unisal-master/unisal/train.py", line 1057, in model
    self._model = model_cls(**self.model_cfg)

  File "/home/b256/Data/saliency_models/unisal-master/unisal/model.py", line 190, in __init__
    self.cnn = MobileNetV2(**self.cnn_cfg)

  File "/home/b256/Data/saliency_models/unisal-master/unisal/models/MobileNetV2.py", line 156, in __init__
    Path(__file__).resolve().parent / 'weights/mobilenet_v2.pth.tar')

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 367, in load
    return _load(f, map_location, pickle_module)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 538, in _load
    result = unpickler.load()

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 504, in persistent_load
    data_type(size), location)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 113, in default_restore_location
    result = fn(storage, location)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 94, in _cuda_deserialize
    device = validate_cuda_device(location)

  File "/home/b256/anaconda3/envs/unisal36/lib/python3.6/site-packages/torch/serialization.py", line 78, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU.
1个回答

3
https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-on-gpu-load-on-cpu中,您将看到有一个map_location关键字参数,用于将权重发送到正确的设备:
model.load_state_dict(torch.load(PATH, map_location=device))

从文档中https://pytorch.org/docs/stable/generated/torch.load.html#torch.load得知:

torch.load()使用Python的反序列化功能,但是对底层张量的存储进行了特殊处理。它们首先在CPU上进行反序列化,然后移动到它们被保存的设备上。如果失败(例如因为运行时系统没有某些设备),则会引发异常。但是,可以使用map_location参数动态重新映射到另一组设备。如果map_location是可调用的,则每个序列化存储将调用一次,有两个参数:storage和location。storage参数将是存储的初始反序列化,驻留在CPU上。每个序列化存储都有一个与其关联的位置标签,用于标识它保存的设备,这个标签是传递给map_location的第二个参数。内置位置标签是“cpu”表示CPU张量,“cuda:device_id”(例如“cuda:2”)表示CUDA张量。map_location应该返回None或存储。如果map_location返回存储,它将被用作最终反序列化对象,已经移动到正确的设备上。否则,torch.load()将退回到默认行为,就像未指定map_location一样。如果map_location是torch.device对象或包含设备标记的字符串,则表示应加载所有张量的位置。否则,如果map_location是字典,则将用于重新映射文件中出现的位置标签(键)到指定存储位置的标签(值)。

那么在这种情况下,我该如何做?我尝试过:self.load_state_dict(torch.load(directory / f'weights_{name}.pth', map_location=torch.device('cpu'))),但仍然出现相同的错误吗? - undefined
什么是错误?你没有分享错误。 - undefined
看起来我建议的和你评论的应该能修复这个错误。你确定这个错误是由带有map_location的修正代码引起的吗? - undefined
是的,我尝试过好几次。甚至为了确保问题不在于conda环境,我还尝试了不同的conda环境,包括Python 3.8和3.6。但是没有任何改变。这是MobileNet的问题吗?我不明白——这个工具甚至有一个用于比较CPU和GPU性能的基准测试功能。所以它应该在CPU上能够正常工作。 - undefined
1
不是MobileNet的问题,而是你的代码有问题。所以我建议你创建一个带有硬编码值的小代码片段:torch.load(PATH, map_location=torch.device("cpu")) - undefined
非常好的提示,非常感谢您——这个方法起作用了。所以我找到了另外3个地方,我需要删除.cuda或在所有这些模块中添加('cpu)。看起来对可用的GPU是硬编码的。 - undefined

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接