CUDA中的HAAR小波变换

Question

CUDA中的HAAR小波变换

cudahaar-wavelet

4

我尝试在CUDA中实现了针对1D数组的HAAR小波变换。

算法

输入数组中有8个索引。

根据这个条件if(x_index>=o_width/2 || y_index>=o_height/2)，我将拥有4个线程，它们应该是0、2、4、6，我计划用每个线程处理两个输入索引。

我计算平均值。例如：如果我的线程ID是'0'，那么平均值为(input[0]+input[1])/2，同时我也得到差异，差异为input[0]-avg，对于其他线程也是同样的计算。

现在重要的事情是输出的位置。我为输出创建了一个单独的线程ID，因为使用索引0、2、4、6会导致输出在正确的索引中放置时出现困难。

我的平均值应该放在输出的前4个索引即0、1、2、3中，并且o_thread_id应该为0、1、2、3。同样地，为了将差异放置在4、5、6、7中，我已经使用代码中显示的方式将0、1、2、3增加了'4'。

问题

我的输出结果全部为零!!! 无论我怎么改变都是这样。

代码

__global__ void cal_haar(int input[],float output [],int i_widthstep,int o_widthstep,int o_width,int o_height)
{

    int x_index=blockIdx.x*blockDim.x+threadIdx.x;
    int y_index=blockIdx.y*blockDim.y+threadIdx.y;

    if(x_index>=o_width/2 || y_index>=o_height/2) return;

    int i_thread_id=y_index*i_widthstep+(2*x_index);
    int o_thread_id=y_index*o_widthstep+x_index;

    float avg=(input[i_thread_id]+input[i_thread_id+1])/2;
    float diff=input[i_thread_id]-avg;
    output[o_thread_id]=avg;
    output[o_thread_id+4]=diff;

}

void haar(int input[],float output [],int i_widthstep,int o_widthstep,int o_width,int o_height)
{

    int * d_input;
    float * d_output;

    cudaMalloc(&d_input,i_widthstep*o_height);
    cudaMalloc(&d_output,o_widthstep*o_height);

    cudaMemcpy(d_input,input,i_widthstep*o_height,cudaMemcpyHostToDevice);

    dim3 blocksize(16,16);
    dim3 gridsize;
    gridsize.x=(o_width+blocksize.x-1)/blocksize.x;
    gridsize.y=(o_height+blocksize.y-1)/blocksize.y;

    cal_haar<<<gridsize,blocksize>>>(d_input,d_output,i_widthstep,o_widthstep,o_width,o_height);


    cudaMemcpy(output,d_output,o_widthstep*o_height,cudaMemcpyDeviceToHost);

    cudaFree(d_input);
    cudaFree(d_output);

}

以下是我的主函数：-

void main()
{
    int in_arr[8]={1,2,3,4,5,6,7,8};
    float out_arr[8];
    int i_widthstep=8*sizeof(int);
    int o_widthstep=8*sizeof(float);
    haar(in_arr,out_arr,i_widthstep,o_widthstep,8,1);

    for(int c=0;c<=7;c++)
    {cout<<out_arr[c]<<endl;}
    cvWaitKey();

}

你能告诉我哪里出错了，导致输出为0吗？谢谢。

- Code_Jamer

你尝试使用 * 而不是 [] 了吗？ - geek

抱歉，我不明白。您能提供具体的代码行吗？ - Code_Jamer

1

请在您的问题中包含额外的相关信息。在评论中阅读和跟踪非常困难。 - Bart

好的。对此给您带来的不便我深表歉意。 - Code_Jamer

1

好的，widthstep是以字节为单位的，这样就可以处理malloc和memcpy。但在内核中，您使用它来计算浮点数组中的索引，这意味着您将访问尚未分配的内存。请遵循@talonmies的建议并添加一些错误检查。另外，请查看cuda-memcheck工具以帮助您找到此类错误。 - Peter

显示剩余6条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- aland · Accepted Answer

你的代码问题在于以下条件：

if(x_index>=o_width/2 || y_index>=o_height/2) return;

给定 o_height = 1，我们有 o_height/2 = 0（o_height 是一个 int 类型，因此这里采用整数除法并向下取整），因此没有任何线程执行操作。为了实现您想要的效果，您可以在此处进行浮点运算，或使用 (o_height+1)/2 和 (o_width+1)/2：它将执行“算术”舍入的除法（您将拥有 ( x_index >= (8+1)/2 /*= 4*/ && y_index >= (1+1)/2 /*= 1*/ )）。

此外，当 Y 维度中有多个线程时，存在 addressing 问题，因此您的 i_thread_id 和 o_thread_id 计算会出错（_withstep 是以字节为单位的大小，但您将其用作数组索引）。