如何在PyTorch中使用多个GPU训练模型？

Question

如何在PyTorch中使用多个GPU训练模型？

4

我的服务器有两个GPU，如何同时使用两个GPU进行训练，以最大化它们的计算能力？下面的代码是否正确？它是否允许我的模型得到适当的训练？

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.bert = pretrained_model
        # for param in self.bert.parameters():
        #     param.requires_grad = True
        self.linear = nn.Linear(2048, 4)


    #def forward(self, input_ids, token_type_ids, attention_mask):
    def forward(self, input_ids, attention_mask):
        batch = input_ids.size(0)
        #output = self.bert(input_ids, token_type_ids, attention_mask).pooler_output
        output = self.bert(input_ids, attention_mask).last_hidden_state
        print('last_hidden_state',output.shape) # torch.Size([1, 768]) 
        #output = output.view(batch, -1) #
        output = output[:,-1,:]#(batch_size, hidden_size*2)(batch_size,1024)
        output = self.linear(output)
        return output

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.device_count() > 1:
    print("Use", torch.cuda.device_count(), 'gpus')
    model = MyModel()
    model = nn.DataParallel(model)
    model = model.to(device)

- user19185238

2个回答

1

我使用数据并行处理。我参考了这个链接，它是一个有用的参考。

- user19185238

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mazen · Accepted Answer

在多个GPU上进行训练有两种不同的方式：

数据并行 = 将无法适应单个GPU内存的大批量数据分成多个小批量数据运行在多个GPU上，每个GPU处理自己能够容纳的小批量数据。
模型并行 = 将模型中的层分配到不同的设备上，需要一些技巧来管理和处理。

请参考本文以获取更多信息。

要在纯PyTorch中进行数据并行，请参考我之前创建的这个示例，以了解最新版PyTorch（截至今天，1.12）的最新更改。

为了利用其他库进行多GPU训练而无需编写太多代码，我建议使用PyTorch Lightning，因为其API简单明了，并且提供了良好的文档以学习如何使用数据并行来进行多GPU训练。

更新：2022/10/25

以下是一个详细解释不同类型分布式训练的视频：https://youtu.be/BPYOsDCZbno?t=1011