调用CUDA内核时出现“无效的配置参数”错误？

Question

调用CUDA内核时出现“无效的配置参数”错误？

34

这是我的代码：

int threadNum = BLOCKDIM/8;
dim3 dimBlock(threadNum,threadNum);
int blocks1 = nWidth/threadNum + (nWidth%threadNum == 0 ? 0 : 1);
int blocks2 = nHeight/threadNum + (nHeight%threadNum == 0 ? 0 : 1);
dim3 dimGrid;
dimGrid.x = blocks1;
dimGrid.y = blocks2;

//  dim3 numThreads2(BLOCKDIM);
//  dim3 numBlocks2(numPixels/BLOCKDIM + (numPixels%BLOCKDIM == 0 ? 0 : 1) );
perform_scaling<<<dimGrid,dimBlock>>>(imageDevice,imageDevice_new,min,max,nWidth, nHeight);
cudaError_t err = cudaGetLastError();
cudasafe(err,"Kernel2");

这是我的第二个内核的执行过程，它在数据使用方面是完全独立的。BLOCKDIM为512，nWidth和nHeight也为512，cudasafe仅仅打印错误代码的相应字符串信息。在内核调用之后，代码的这一部分会给出配置错误。

可能是什么导致了这个错误呢？有什么想法吗？

- erogol

2个回答

2

补充之前的回答，你也可以在代码中找到允许的最大线程数，这样它就可以在其他设备上运行，而不需要硬编码你将使用的线程数：

struct cudaDeviceProp properties;
cudaGetDeviceProperties(&properties, device);
cout<<"using "<<properties.multiProcessorCount<<" multiprocessors"<<endl;
cout<<"max threads per processor: "<<properties.maxThreadsPerMultiProcessor<<endl;

- Niko

1

大多数CUDA GPU的maxThreadsPerMultiProcessor为2048，并不表示线程块级别的限制，这正是本问题讨论的内容。 - Robert Crovella

2

“device” 是从哪里来的？ - BitTickler

2

@BitTickler 我发现这是你设备的索引。如果你只有一个GPU，通常为0。 - OOM

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Robert Crovella · Accepted Answer

This type of error message often relates to the launch configuration parameters, such as grid/threadblock dimensions in this case, or shared memory, etc. in other cases. When you encounter such a message, it is advisable to print out your actual config parameters before launching the kernel to check if there are any mistakes.

You mentioned that BLOCKDIM is 512, and threadNum = BLOCKDIM/8, so threadNum is equal to 64. Your threadblock configuration is:

dim3 dimBlock(threadNum,threadNum);

您希望启动64 x 64个线程的块，即每个块4096个线程。这在任何一代CUDA设备上都不可行。所有当前的CUDA设备都限制每个块最多只能有1024个线程，这是3个块维度的乘积。CUDA编程指南中的表14列出了最大尺寸，并且也可以通过deviceQuery CUDA示例代码获得。