我想理解Dataloader中的pin_memory是如何工作的。
根据文档:
pin_memory (bool, optional) – If True, the data loader will copy tensors into CUDA pinned memory before returning them.
以下是一个自包含的代码示例。
import torchvision
import torch
print('torch.cuda.is_available()', torch.cuda.is_available())
train_dataset = torchvision.datasets.CIFAR10(root='cifar10_pytorch', download=True, transform=torchvision.transforms.ToTensor())
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, pin_memory=True)
x, y = next(iter(train_dataloader))
print('x.device', x.device)
print('y.device', y.device)
生成以下输出:
torch.cuda.is_available() True
x.device cpu
y.device cpu
但我预期会得到这样的结果,因为我在Dataloader
中指定了标志pin_memory=True
。
torch.cuda.is_available() True
x.device cuda:0
y.device cuda:0
同时我也进行了一些基准测试:
import torchvision
import torch
import time
import numpy as np
pin_memory=True
train_dataset =torchvision.datasets.CIFAR10(root='cifar10_pytorch', download=True, transform=torchvision.transforms.ToTensor())
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, pin_memory=pin_memory)
print('pin_memory:', pin_memory)
times = []
n_runs = 10
for i in range(n_runs):
st = time.time()
for bx, by in train_dataloader:
bx, by = bx.cuda(), by.cuda()
times.append(time.time() - st)
print('average time:', np.mean(times))
我得到了以下结果。
pin_memory: False
average time: 6.5701503753662
pin_memory: True
average time: 7.0254474401474
pin_memory=True
只会使事情变得更慢。
有人能解释一下这种行为吗?