THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu line=265 error=59 : device-side assert triggered
Traceback (most recent call last):
File "main.py", line 109, in <module>
train(loader_train, model, criterion, optimizer)
File "main.py", line 54, in train
optimizer.step()
File "/usr/local/anaconda35/lib/python3.6/site-packages/torch/optim/sgd.py", line 93, in step
d_p.add_(weight_decay, p.data)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:265
我该如何解决这个错误?
CUDA_LAUNCH_BLOCKING=1 python your_script.py
来运行您的脚本,以获得更准确的堆栈跟踪。 - McLawrence/opt/conda/.../THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
这个错误会出现大约20次。然后跟踪信息如下:RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:116
如何解决? - saichandt >= 0 && t < n_classes
。请打印出你的标签并确保它们为正数且小于你最后一层输出的数量。 - McLawrencereturn self.apply(lambda x: x.to(device), *keys)
但是如果我不使用to(device)选项,则会显示CUDA(x所需的)和此情况下实际x的cpu之间的设备不匹配错误。 - Kanishk Mair