我有一些代码想要转换成CUDA核函数。请看:
for (r = Y; r < Y + H; r+=2)
{
ch1RowSum = ch2RowSum = ch3RowSum = 0;
for (c = X; c < X + W; c+=2)
{
chan1Value = //some calc'd value
chan3Value = //some calc'd value
chan2Value = //some calc'd value
ch2RowSum += chan2Value;
ch3RowSum += chan3Value;
ch1RowSum += chan1Value;
}
ch1Mean += ch1RowSum / W;
ch2Mean += ch2RowSum / W;
ch3Mean += ch3RowSum / W;
}
这个问题应该分成两个内核,一个用于计算行求和,另一个用于计算均值。但是,由于循环索引不从零开始且不以N结束,我该如何处理呢?