使用CUDA并行处理将彩色图像转换为灰度图像

Question

使用CUDA并行处理将彩色图像转换为灰度图像

image-processingcudaparallel-processinggpu

6

我正在尝试解决一个问题，需要将彩色图像转换为灰度图像。为此，我使用了CUDA并行处理的方法。

我在GPU上调用的内核代码如下。

__global__
void rgba_to_greyscale(const uchar4* const rgbaImage,
                   unsigned char* const greyImage,
                   int numRows, int numCols)
{
    int absolute_image_position_x = blockIdx.x;  
    int absolute_image_position_y = blockIdx.y;

  if ( absolute_image_position_x >= numCols ||
   absolute_image_position_y >= numRows )
 {
     return;
 }
uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y];
float channelSum = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z;
greyImage[absolute_image_position_x + absolute_image_position_y] = channelSum;

}

void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage,
                            uchar4 * const d_rgbaImage,
                            unsigned char* const d_greyImage,
                            size_t numRows,
                            size_t numCols)
{
  //You must fill in the correct sizes for the blockSize and gridSize
  //currently only one block with one thread is being launched
  const dim3 blockSize(numCols/32, numCols/32 , 1);  //TODO
  const dim3 gridSize(numRows/12, numRows/12 , 1);  //TODO
  rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage,
                                             d_greyImage,
                                             numRows,
                                             numCols);

  cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}

我看到第一个像素行中有一排点。

我遇到的错误是

libdc1394错误：无法初始化libdc1394
位置51处的差异超过了5的容差
参考值：255
GPU : 0
我的输入/输出图像有人能帮我解决吗？提前感谢。

- Ashish Singh

1

请给您的问题起一个更有意义的标题。目前这个标题对于除了你自己以外的任何人都毫无意义。如果有其他人有类似的图像处理问题，他们怎么可能通过搜索找到它呢？ - talonmies

@talonmies：希望现在标题有意义了。 - Ashish Singh

2

这是Udacity的“并行编程入门”课程的一项作业。您应该自己解决它，而不是使用Stack Overflow让他人为您解决。 - RoBiK

6

@RoBiK: 我只是好奇，并且同时尝试着解决它，至于“让别人为你解决问题”的问题，我认为我的目的不是把答案提交给Udacity并计入成绩，而是更多地与编程社区中的其他人讨论并从他们的专业知识中学习，希望这对你有意义。 - Ashish Singh

12个回答

5

自从我发布了这个问题以来，我一直在不断地解决这个问题。
现在有几个改进需要做才能正确解决这个问题。我意识到我的初始解决方案是错误的。
需要进行以下更改：

 1. absolute_position_x =(blockIdx.x * blockDim.x) + threadIdx.x;
 2. absolute_position_y = (blockIdx.y * blockDim.y) + threadIdx.y;

其次，

 1. const dim3 blockSize(24, 24, 1);
 2. const dim3 gridSize((numCols/16), (numRows/16) , 1);

在这个解决方案中，我们使用了一个 numCols/16 * numCols/16 的网格和一个 24 * 24 的块大小。代码执行时间为0.040576毫秒。@datenwolf:感谢您回答上面的问题！！！

- Ashish Singh

2

有没有想法为什么blockSize需要是24, 24，而gridSize需要是numCols/16, numRows/16？这样做有原因吗？其他数字可行吗？ - alvas

2

由于您不知道图像的大小，因此最好选择任何合理的二维线程块尺寸，然后检查两个条件。第一个条件是内核中的pos_x和pos_y索引不超过numRows和numCols。其次，网格大小应略高于所有块中线程的总数。

const dim3 blockSize(16, 16, 1);
const dim3 gridSize((numCols%16) ? numCols/16+1 : numCols/16,
(numRows%16) ? numRows/16+1 : numRows/16, 1);

- MuneshSingh

1

计算绝对 x 和 y 图像位置的方法很完美。但是当您需要访问彩色图像中的特定像素时，您不应该使用以下代码吗？

uchar4 rgba = rgbaImage[absolute_image_position_x + (absolute_image_position_y * numCols)];

当与你写的串行代码执行相同问题的代码进行比较时，我也是这样认为的。请告诉我 :)

- roynalnaruto

1

__global__
void rgba_to_greyscale(const uchar4* const rgbaImage,
                       unsigned char* const greyImage,
                       int numRows, int numCols)
{
    int rgba_x = blockIdx.x * blockDim.x + threadIdx.x;
    int rgba_y = blockIdx.y * blockDim.y + threadIdx.y;
    int pixel_pos = rgba_x+rgba_y*numCols;

    uchar4 rgba = rgbaImage[pixel_pos];
    unsigned char gray = (unsigned char)(0.299f * rgba.x + 0.587f * rgba.y + 0.114f * rgba.z);
    greyImage[pixel_pos] = gray;
}

void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
                            unsigned char* const d_greyImage, size_t numRows, size_t numCols)
{
    //You must fill in the correct sizes for the blockSize and gridSize
    //currently only one block with one thread is being launched
    const dim3 blockSize(24, 24, 1);  //TODO
    const dim3 gridSize( numCols/24+1, numRows/24+1, 1);  //TODO
    rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);

    cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}

- bzhan

虽然你可能得到了正确的答案，但你的方法非常奇怪。你传递列而不是行到你的gridsize中，而且你的pixel_pos公式与将2D数组展平为1D数组的标准方式不符。它应该是numRows * y + x或numCols * x + y，但因为你的网格设置为cols，rows而不是rows，cols，所以一切都能正常工作。 - labheshr

1

在这种情况下，libdc1394错误与Firewire等无关 - 它是Udacity用于比较程序创建的图像和参考图像的库。它所说的是，你的图像与参考图像之间的差异已经超过了特定位置（即像素）的阈值。

- metamorphosis

1

libdc1394错误：无法初始化libdc1394

我认为这不是CUDA的问题。libdc1394是用于访问IEEE1394，也称为FireWire或iLink视频设备（DV摄像机，苹果iSight相机）的库。该库未正确初始化，因此您将无法获得有用的结果。基本上是NINO：无意义输入，无意义输出。

- datenwolf

@datewolf，请查看我已经添加了一个链接到输入/输出图像输出，这是我得到的结果。 - Ashish Singh

我看到的是一个错误，位置在51处超出了5的容差，所以我猜测它是否与颜色模式有关，而不是其他链接器类型的错误。 - Ashish Singh

@ashish173：这不是链接器问题，而是运行时问题。dc1394库在程序启动时无法正确初始化，使用它来检索图片时可能只会产生垃圾数据。您必须先解决初始化问题（这是一个运行时问题，即您必须编写代码来解决）。 - datenwolf

1

您仍然可能会遇到运行时问题 - 转换将无法给出正确的结果。

以下行：

1. uchar4 rgba = rgbaImage [absolute_image_position_x + absolute_image_position_y]; 2. greyImage [absolute_image_position_x + absolute_image_position_y] = channelSum;

应更改为：

1. uchar4 rgba = rgbaImage [absolute_image_position_x + absolute_image_position_y * numCols]; 2. greyImage [absolute_image_position_x + absolute_image_position_y * numCols] = channelSum;

- Alex

0

具有处理非标准输入图像能力的相同代码

int idx=blockDim.x*blockIdx.x+threadIdx.x;
int idy=blockDim.y*blockIdx.y+threadIdx.y;

uchar4 rgbcell=rgbaImage[idx*numCols+idy];

   greyImage[idx*numCols+idy]=0.299*rgbcell.x+0.587*rgbcell.y+0.114*rgbcell.z;


  }

  void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
                        unsigned char* const d_greyImage, size_t numRows, size_t numCols)
 {
 //You must fill in the correct sizes for the blockSize and gridSize
 //currently only one block with one thread is being launched

int totalpixels=numRows*numCols;
int factors[]={2,4,8,16,24,32};
vector<int> numbers(factors,factors+sizeof(factors)/sizeof(int));
int factor=1;

   while(!numbers.empty())
  {
 if(totalpixels%numbers.back()==0)
 {
     factor=numbers.back();
     break;
 }
   else
   {
  numbers.pop_back();
   }
 }



 const dim3 blockSize(factor, factor, 1);  //TODO
 const dim3 gridSize(numRows/factor+1, numCols/factor+1,1);  //TODO
 rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage,    numRows, numCols);

- Creative_Cimmons

0

const dim3 blockSize(16, 16, 1);  //TODO
const dim3 gridSize( (numRows+15)/16, (numCols+15)/16, 1);  //TODO

int x = blockIdx.x * blockDim.x + threadIdx.x;  
int y = blockIdx.y * blockDim.y + threadIdx.y;

uchar4 rgba = rgbaImage[y*numRows + x];
float channelSum = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z;
greyImage[y*numRows + x] = channelSum;

- Mo Yi

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dharmendar Kumar · Accepted Answer

我最近参加了这个课程，试了一下你的解决方案但是它不起作用，所以我尝试了自己的方法。你的方法几乎正确，正确的解决方案是：

__global__`
void rgba_to_greyscale(const uchar4* const rgbaImage,
               unsigned char* const greyImage,
               int numRows, int numCols)
{`

int pos_x = (blockIdx.x * blockDim.x) + threadIdx.x;
int pos_y = (blockIdx.y * blockDim.y) + threadIdx.y;
if(pos_x >= numCols || pos_y >= numRows)
    return;

uchar4 rgba = rgbaImage[pos_x + pos_y * numCols];
greyImage[pos_x + pos_y * numCols] = (.299f * rgba.x + .587f * rgba.y + .114f * rgba.z); 

}

其余的代码与你的代码相同。