GDI+中快速生成投影的算法

15

在GDI中,如何高效地为图像添加阴影?

我现在从我的图像开始:

enter image description here

我使用ImageAttributes和ColorMatrix将图像的Alpha蒙版绘制到新图像中:

colorMatrix = (
    ( 0,  0,  0, 0, 0),
    ( 0,  0,  0, 0, 0),
    ( 0,  0,  0, 0, 0),
    (-1, -1, -1, 1, 0),
    ( 1,  1,  1, 0, 1)
    );

图片描述

然后我应用了高斯模糊卷积核,并稍微偏移了一下:

图片描述

然后我将原始图像绘制到顶部:

图片描述

问题是这太慢了,生成带有投影的图像需要约170毫秒,而不带投影只需要2毫秒(慢70倍):

  • 带阴影: 171,332 µs
  • 不带阴影:2,457us

当用户(例如我)在滚动项目列表时,额外的169ms延迟非常明显。


以下代码可以忽略,它对问题或答案没有任何贡献:

class function TImageEffects.GenerateDropShadow(image: TGPImage;
        const radius: Single; const OffsetX, OffsetY: Single; const Opacity: Single): TGPBitmap;
var
    width, height: Integer;
    alphaMask: TGPBitmap;
    shadow: TGPBitmap;
    graphics: TGPGraphics;
    imageAttributes: TGPImageAttributes;
    cm: TColorMatrix;
begin
{
    We generate a drop shadow by first getting the alpha mask. This will be a black
    sillouette on a transparent background. We then blur the black "shadow" by the amounts
    given.
    We then draw the original image on top of it's own shadow.
}

{
    http://msdn.microsoft.com/en-us/library/aa511280.aspx
    Windows Vista User Experience -> Guidelines -> Aesthetics -> Icons

    Basic Flat Icon Shadow Ranges

    Flat icons
    Flat icons are generally used for file icons and flat real-world objects,
    such as a document or a piece of paper.

    Flat icon lighting comes from the upper-left at 130 degrees.

    Smaller icons (for example, 16x16 and 32x32) are simplified for readability.
    However, if they contain a reflection within the icon (often simplified),
    they may have a tight drop shadow. The drop shadow ranges in opacity from
    30-50 percent.
    Layer effects can be used for flat icons, but should be compared with other
    flat icons. The shadows for objects will vary somewhat, according to what
    looks best and is most consistent within the size set and with the other
    icons in Windows Vista. On some occasions, it may even be necessary to
    modify the shadows. This will especially be true when objects are laid over
    others.
    A subtle range of colors may be used to achieve desired outcome. Shadows help
    objects sit in space. Color impacts the perceived weight of the shadow, and
    may distort the image if it is too heavy.

    Blend mode: Multiply
    Opacity: 22% to 50% - depends on color of the item.
    Angle: 130 to 120, use global light
    Distance: 3 (256 thru 48x), Distance = 1 (32x, 24x)
    Spread: 0
    Size: 7 (256x thru 48x), Spread = 2 (32x, 24x)
}
    width := image.GetWidth;
    height := image.GetHeight;

    //Get bitmap to hold final composited image and shadow
    Result := TGPBitmap.Create(width, height, PixelFormat32bppARGB);

    //Use ColorMatrix methods to "draw" the alpha image.
    alphaMask := TImageEffects.GetAlphaMask(image);
    try
        //Blur the black and white shadow image
//      shadow := TImageEffects.BoxBlur(alphaMask, radius);
        shadow := TImageEffects.GaussianBlur(alphaMask, radius); //because Gaussian Blur is linearly-separable into two 1d kernels, it's actually faster than the box blur
    finally
        alphaMask.Free;
    end;

    //Draw
    graphics := TGPGraphics.Create(Result);
    try
        //Draw the "shadow", using the passed in opacity value.
        {
            Color transformations are of the form
        c =  (r, g, b, a)
        c' = (r, g, b, a)
        c' = c*M
            = (r, g, b, a, 1) * (0 0 0 0 0)  //r
                                      (0 0 0 0 0)  //g
                                      (0 0 0 0 0)  //b
                                      (1 1 1 1 0)  //a
                                      (0 0 0 0 1)  //1
        }

        imageAttributes := TGPImageAttributes.Create;
    {   cm := (
                ( 1, 0, 0, 0,   0),
                ( 0, 1, 0, 0,   0),
                ( 0, 0, 1, 0,   0),
                ( 0, 0, 0, 0.5, 0),
                ( 0, 0, 0, 0,   1)
            );}
        cm[0, 0] :=  1; cm[0, 1] :=  0; cm[0, 2] :=  0; cm[0, 3] := 0;       cm[0, 4] := 0;
        cm[1, 0] :=  0; cm[1, 1] :=  1; cm[1, 2] :=  0; cm[1, 3] := 0;       cm[1, 4] := 0;
        cm[2, 0] :=  0; cm[2, 1] :=  0; cm[2, 2] :=  1; cm[2, 3] := 0;       cm[2, 4] := 0;
        cm[3, 0] :=  0; cm[3, 1] :=  0; cm[3, 2] :=  0; cm[3, 3] := Opacity; cm[3, 4] := 0;
        cm[4, 0] :=  0; cm[4, 1] :=  0; cm[4, 2] :=  0; cm[4, 3] := 0;       cm[4, 4] := 1;


        imageAttributes.SetColorMatrix(
                cm,
                ColorMatrixFlagsDefault,
                ColorAdjustTypeBitmap);
        try
            graphics.DrawImage(shadow,
                        MakeRectF(OffsetX, OffsetY, width, height), //destination rectangle
                        0, 0, //source (x,y)
                        width, height, //source width, height
                        UnitPixel,
                        ImageAttributes);

            //Draw original image over-top of it's shadow
            graphics.DrawImage(image, 0, 0);
        finally
            imageAttributes.Free;
        end;
    finally
        graphics.Free;
    end;
end;
使用该函数来获取灰度透明掩码:
class function TImageEffects.GetAlphaMask(image: TGPImage): TGPBitmap;
var
    imageAttributes: TGPImageAttributes;
    cm: TColorMatrix;
    graphics: TGPGraphics;
    Width, Height: UINT;
begin
    {
        Color transformations are of the form
    c =  (r, g, b, a)
    c' = (r, g, b, a)
    c' = c*M
        = (r, g, b, a, 1) * (0 0 0 0 0)
                            (0 0 0 0 0)
                            (0 0 0 0 0)
                            (1 1 1 1 0)
                            (0 0 0 0 1)
    }

    imageAttributes := TGPImageAttributes.Create;

{   cm := (
            ( 0,  0,  0, 0, 0),
            ( 0,  0,  0, 0, 0),
            ( 0,  0,  0, 0, 0),
            (-1, -1, -1, 1, 0),
            ( 1,  1,  1, 0, 1)
        );}
    cm[0, 0] :=  0; cm[0, 1] :=  0; cm[0, 2] :=  0; cm[0, 3] := 0; cm[0, 4] := 0;
    cm[1, 0] :=  0; cm[1, 1] :=  0; cm[1, 2] :=  0; cm[1, 3] := 0; cm[1, 4] := 0;
    cm[2, 0] :=  0; cm[2, 1] :=  0; cm[2, 2] :=  0; cm[2, 3] := 0; cm[2, 4] := 0;
    cm[3, 0] := -1; cm[3, 1] := -1; cm[3, 2] := -1; cm[3, 3] := 1; cm[3, 4] := 0;
    cm[4, 0] :=  1; cm[4, 1] :=  1; cm[4, 2] :=  1; cm[4, 3] := 0; cm[4, 4] := 1;


    imageAttributes.SetColorMatrix(
            cm,
            ColorMatrixFlagsDefault,
            ColorAdjustTypeBitmap);

    width := image.GetWidth;
    height := image.GetHeight;

    Result := TGPBitmap.Create(Integer(width), Integer(height));
    graphics := TGPGraphics.Create(Result);
   try
        graphics.DrawImage(
                image,
                MakeRect(0, 0, width, height), //destination rectangle
             0, 0, //source (x,y)
             width, height,
             UnitPixel,
                ImageAttributes);
   finally
        graphics.Free;
    end;
end;

核心是高斯模糊:

class function TImageEffects.GaussianBlur(const bitmap: TGPBitmap;
  radius: Single): TGPBitmap;
var
    width, height: Integer;
    tempBitmap: TGPBitmap;
    bdSource: TBitmapData;
    bdTemp: TBitmapData;
    bdDest: TBitmapData;
    pSrc: PARGBArray;
    pTemp: PARGBArray;
    pDest: PARGBArray;
    stride: Integer;
    kernel: TKernel;
begin
//  kernel := MakeGaussianKernel2d(radius);
    kernel := MakeGaussianKernel1d(radius);
    try
//      Result := ConvolveBitmap(bitmap, kernel); brute 2d kernel

        width := bitmap.GetWidth;
        height := bitmap.GetHeight;

        // GDI+ still lies to us - the return format is BGR, NOT RGB.
        bitmap.LockBits(MakeRect(0, 0, width, height),
                ImageLockModeRead,
                PixelFormat32bppPARGB, bdSource);

        //intermediate bitmap
        tempBitmap := TGPBitmap.Create(width, height, PixelFormat32bppPARGB);
        tempBitmap.LockBits(MakeRect(0, 0, width, height),
                    ImageLockModeWrite,
                    PixelFormat32bppPARGB, bdTemp);

        //target bitmap
        Result := TGPBitmap.Create(width, height, PixelFormat32bppARGB);
        Result.LockBits(MakeRect(0, 0, width, height),
                    ImageLockModeWrite,
                    PixelFormat32bppPARGB, bdDest);

        pSrc := PARGBArray(bdSource.Scan0);
        pTemp := PARGBArray(bdTemp.Scan0);
        pDest := PARGBArray(bdDest.Scan0);
        stride := bdSource.Stride;

        ConvolveAndTranspose(kernel, pSrc^, pTemp^, width, height, stride, True, EdgeActionClampEdges);
        ConvolveAndTranspose(kernel, pTemp^, pDest^, height, width, stride, True, EdgeActionClampEdges);

        //Unlock source
       bitmap.UnlockBits(bdSource);
        tempBitmap.UnlockBits(bdTemp);
        Result.UnlockBits(bdDest);

        //get rid of temp
        tempBitmap.Free;
    finally
        kernel.Free;
    end;
end;
需要一个1-D内核:
class function TImageEffects.MakeGaussianKernel1d(radius: Single): TKernel;
var
    r: Integer;
    rows: Integer;
    matrix: TSingleDynArray;
    sigma: Single;
    sigma22: Single;
    sigmaPi2: Single;
    sqrtSigmaPi2: Single;
    radius2: Single;
    total: Single;
    index: Integer;
    row: Integer;
    distance: Single;
    i: Integer;
begin
    r := Ceil(radius);
    rows := r*2+1;

    SetLength(matrix, rows);
    sigma := radius/3.0;
    sigma22 := 2*sigma*sigma;
    sigmaPi2 := 2*pi*sigma;
    sqrtSigmaPi2 := Sqrt(sigmaPi2);
    radius2 := radius*radius;
    total := 0;

    Index := 0;
    for row := -r to r do
    begin
        distance := row*row;
        if (distance > radius2) then
            matrix[index] := 0
        else
        begin
            matrix[index] := Exp((-distance)/sigma22) / sqrtSigmaPi2;
            total := total + matrix[index];
            Inc(index);
        end;
    end;

    //Normalize the values
    for i := 0 to rows-1 do
        matrix[i] := matrix[i] / total;


    Result := TKernel.Create(rows, 1, matrix);
end;

高斯函数的魔力在于它可以分解为两个一维卷积。

class procedure TImageEffects.convolveAndTranspose(kernel: TKernel;
  const inPixels: array of ARGB; var outPixels: array of ARGB; width,
  height, stride: Integer; alpha: Boolean; edgeAction: TEdgeAction);
var
    index: Integer;
    matrix: TSingleDynArray;
    rows: Integer; //number of rows in the kernel
    cols: Integer; //number of columns in the kernel
    rows2: Integer; //half row count
    cols2: Integer; //half column count

    x, y: Integer; //
    r, g, b, a: Single; //summed red, green, blue, alpha values
    row, col: Integer;
    ix, iy, ioffset: Integer;
    moffset: Integer;
    f: Single;
    rgb: ARGB;
    ir, ig, ib, ia: Integer;

   function ClampPixel(value: Single): Integer;
    begin
        Result := Trunc(value+0.5);
        if Result < 0 then
            Result := 0
        else if Result > 255 then
            Result := 255;
    end;
begin
    matrix := kernel.KernelData;
    cols := kernel.Width;
    cols2 := cols div 2;

    for y := 0 to height-1 do
    begin
        index := y;
        ioffset := y*width;
        for x := 0 to width-1 do
        begin
            r := 0;
            g := 0;
            b := 0;
            a := 0;

            moffset := cols2;
            for col := -cols2 to cols2 do
            begin
                f := matrix[moffset+col];

                if (f <> 0) then
                begin
                    ix := x+col;
                    if ( ix < 0 ) then
                    begin
                        if ( edgeAction = EdgeActionClampEdges ) then
                            ix := 0
                        else if ( edgeAction = EdgeActionWrapEdges ) then
                            ix := (x+width) mod width;
                    end
                    else if ( ix >= width) then
                    begin
                        if ( edgeAction = EdgeActionClampEdges ) then
                            ix := width-1
                        else if ( edgeAction = EdgeActionWrapEdges ) then
                            ix := (x+width) mod width;
                    end;
                    rgb := inPixels[ioffset+ix];
                    a := a + f * ((rgb shr 24) and $FF);
                    r := r + f * ((rgb shr 16) and $FF);
                    g := g + f * ((rgb shr  8) and $FF);
                    b := b + f * ((rgb       ) and $FF);
                end;
            end;
            if alpha then
                ia := ClampPixel(a)
         else
                ia := $FF;
            ir := ClampPixel(r);
            ig := ClampPixel(g);
            ib := ClampPixel(b);
            outPixels[index] := MakeARGB(ia, ir, ig, ib);

            Inc(index, height);
        end;
    end;
end;

在我的 256x256 像素的原始图像上,附带使用示例:

image := TImageEffects.GenerateDropShadow(localImage, 14, 2.12132, 2.12132, 1.0);

分析显示88.62%的时间花费在以下行中:

a := a + f * ((rgb shr 24) and $FF);
r := r + f * ((rgb shr 16) and $FF);
g := g + f * ((rgb shr  8) and $FF);
b := b + f * ((rgb       ) and $FF);

这是每像素透明度混合。

这让我想到,应该有一种更好的方法来应用软阴影而不是应用模糊效果,毕竟Windows和OSX在实时中为窗口应用了阴影。


1
你是否考虑过遮盖票下方(相当大的区域),这样你就不必计算投影了? - user2056327
你应该使用指针和不安全代码,这将显著提高速度。 - Gábor
@Gábor,你应该将它发布为答案,让每个人都受益。 - Ian Boyd
是的,但我需要先得到一个令人满意的结果。 :-) 最终,我找到了一种基于网络博客的解决方案,虽然不是那么简单,但似乎运行良好。我会立即撰写答案。 - Gábor
Ian,你试过了吗?我现在使用它感到非常满意。 :-) - Gábor
显示剩余7条评论
5个回答

8
该算法来自于这篇博客文章:http://blog.ivank.net/fastest-gaussian-blur.html。当然,它实现的是最后和最快的版本。 :-)
这段代码直接从我的工作代码中复制而来,因此外部假设可能反映出这一点。该函数返回一个较大的位图以适应大小的增加。在您的代码中,您需要相应地处理这个问题。它假定为32位alpha图片,但可以很容易地修改以处理仅为24位(CHANNELS常量和PixelFormat值)。
public static class DropShadow {
  const int CHANNELS = 4;

  public static Bitmap CreateShadow(Bitmap bitmap, int radius, float opacity) {
    // Alpha mask with opacity
    var matrix = new ColorMatrix(new float[][] {
            new float[] {  0F,  0F,  0F, 0F,      0F }, 
            new float[] {  0F,  0F,  0F, 0F,      0F }, 
            new float[] {  0F,  0F,  0F, 0F,      0F }, 
            new float[] { -1F, -1F, -1F, opacity, 0F },
            new float[] {  1F,  1F,  1F, 0F,      1F }
        });

    var imageAttributes = new ImageAttributes();
    imageAttributes.SetColorMatrix(matrix, ColorMatrixFlag.Default, ColorAdjustType.Bitmap);
    var shadow = new Bitmap(bitmap.Width + 4 * radius, bitmap.Height + 4 * radius);
    using (var graphics = Graphics.FromImage(shadow))
      graphics.DrawImage(bitmap, new Rectangle(2 * radius, 2 * radius, bitmap.Width, bitmap.Height), 0, 0, bitmap.Width, bitmap.Height, GraphicsUnit.Pixel, imageAttributes);

    // Gaussian blur
    var clone = shadow.Clone() as Bitmap;
    var shadowData = shadow.LockBits(new Rectangle(0, 0, shadow.Width, shadow.Height), ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
    var cloneData = clone.LockBits(new Rectangle(0, 0, clone.Width, clone.Height), ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);

    var boxes = DetermineBoxes(radius, 3);
    BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, (boxes[0] - 1) / 2);
    BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, (boxes[1] - 1) / 2);
    BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, (boxes[2] - 1) / 2);

    shadow.UnlockBits(shadowData);
    clone.UnlockBits(cloneData);
    return shadow;
  }

  private static unsafe void BoxBlur(BitmapData data1, BitmapData data2, int width, int height, int radius) {
    byte* p1 = (byte*)(void*)data1.Scan0;
    byte* p2 = (byte*)(void*)data2.Scan0;

    int radius2 = 2 * radius + 1;
    int[] sum = new int[CHANNELS];
    int[] FirstValue = new int[CHANNELS];
    int[] LastValue = new int[CHANNELS];

    // Horizontal
    int stride = data1.Stride;
    for (var row = 0; row < height; row++) {
      int start = row * stride;
      int left = start;
      int right = start + radius * CHANNELS;

      for (int channel = 0; channel < CHANNELS; channel++) {
        FirstValue[channel] = p1[start + channel];
        LastValue[channel] = p1[start + (width - 1) * CHANNELS + channel];
        sum[channel] = (radius + 1) * FirstValue[channel];
      }
      for (var column = 0; column < radius; column++)
        for (int channel = 0; channel < CHANNELS; channel++)
          sum[channel] += p1[start + column * CHANNELS + channel];
      for (var column = 0; column <= radius; column++, right += CHANNELS, start += CHANNELS)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += p1[right + channel] - FirstValue[channel];
          p2[start + channel] = (byte)(sum[channel] / radius2);
        }
      for (var column = radius + 1; column < width - radius; column++, left += CHANNELS, right += CHANNELS, start += CHANNELS)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += p1[right + channel] - p1[left + channel];
          p2[start + channel] = (byte)(sum[channel] / radius2);
        }
      for (var column = width - radius; column < width; column++, left += CHANNELS, start += CHANNELS)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += LastValue[channel] - p1[left + channel];
          p2[start + channel] = (byte)(sum[channel] / radius2);
        }
    }

    // Vertical
    stride = data2.Stride;
    for (int column = 0; column < width; column++) {
      int start = column * CHANNELS;
      int top = start;
      int bottom = start + radius * stride;

      for (int channel = 0; channel < CHANNELS; channel++) {
        FirstValue[channel] = p2[start + channel];
        LastValue[channel] = p2[start + (height - 1) * stride + channel];
        sum[channel] = (radius + 1) * FirstValue[channel];
      }
      for (int row = 0; row < radius; row++)
        for (int channel = 0; channel < CHANNELS; channel++)
          sum[channel] += p2[start + row * stride + channel];
      for (int row = 0; row <= radius; row++, bottom += stride, start += stride)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += p2[bottom + channel] - FirstValue[channel];
          p1[start + channel] = (byte)(sum[channel] / radius2);
        }
      for (int row = radius + 1; row < height - radius; row++, top += stride, bottom += stride, start += stride)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += p2[bottom + channel] - p2[top + channel];
          p1[start + channel] = (byte)(sum[channel] / radius2);
        }
      for (int row = height - radius; row < height; row++, top += stride, start += stride)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += LastValue[channel] - p2[top + channel];
          p1[start + channel] = (byte)(sum[channel] / radius2);
        }
    }
  }

  private static int[] DetermineBoxes(double Sigma, int BoxCount) {
    double IdealWidth = Math.Sqrt((12 * Sigma * Sigma / BoxCount) + 1);
    int Lower = (int)Math.Floor(IdealWidth);
    if (Lower % 2 == 0)
      Lower--;
    int Upper = Lower + 2;

    double MedianWidth = (12 * Sigma * Sigma - BoxCount * Lower * Lower - 4 * BoxCount * Lower - 3 * BoxCount) / (-4 * Lower - 4);
    int Median = (int)Math.Round(MedianWidth);

    int[] BoxSizes = new int[BoxCount];
    for (int i = 0; i < BoxCount; i++)
      BoxSizes[i] = (i < Median) ? Lower : Upper;
    return BoxSizes;
  }

}

我想你是希望将其转换为Delphi代码,这涉及到IT技术方面。如果你有半径和三个方框的整数值,根据那篇博客的评论,可以忽略DetermineBoxes()函数并使用以下代码:
BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, radius - 1);
BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, radius - 1);
BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, radius);

相比位图本身,其执行时间可以忽略不计,但仍需优化...


多年过去了,我注意到原始博客收到了一条评论,指向了一个可能更快的C#实现,该实现在很多地方使用了Parallel.For。如果有人仍然需要的话,将这两者结合起来可能会很好... (https://github.com/mdymel/superfastblur) - undefined

2
我询问代码的原因是想看看您是否使用了“快速位图”方法或GetPixel(),SetPixel() 方法。既然您已经覆盖了这一点,我怀疑您在性能优化方面无法做更多事情。 GDI +并不适用于此类每像素操作场景。实际上,您应该考虑实现一个简单的阴影生成器,它看起来可能不那么花哨,但不会占用太多处理器资源。
这非常取决于您的使用场景(您没有真正描述):
  • 图像是否都相似(都是票证或只是样本)?如果是,则可以生成一次阴影并重复使用该位图。
  • 当用户正在做其他事情时,您可以作为后台进程生成和缓存带阴影的图像版本(或仅带阴影的缩略图)。
您还可以尝试在Paint.NET中使用高斯模糊(它大多使用GDI+),并测量其速度。我怀疑您无法使其比Paint.NET更快,因此这是一个很好的基准。

2
当涉及到问题时,问题不在于GDI+。我在内存中有一个ARGB图像,并且必须对每个像素执行乘法。我注意到较新版本的GDI+具有模糊图像效果(msdn.microsoft.com/en-us/library/ms534422(v=vs.85).aspx)。它可能比自己编写的更快;缺点是它不能在Windows XP上工作。假设GDI+的“ApplyEffect”更快:为什么更快?应用模糊的更好算法是什么? - Ian Boyd

1
我测试了一些算法,最好的是Gábor实现的高斯模糊。在我的测试中,该算法的延迟约为20毫秒。
以下是Delphi中该算法的实现,有一些更改(它使用免费的Bilsen GDI+库):
function CreateBlurShadow(ABitmap: IGPBitmap; ARadius: Integer; AOpacity: Double; AColor: TColor = clNone): IGPBitmap;

  procedure BoxBlur(const AData1, AData2: TGPBitmapData; AWidth, AHeight, ARadius: Integer);
  const
    CHANNELS = 4;
  var
    LScan1, LScan2: PByte;
    LSum, LFirstValue, LLastValue: array [0..CHANNELS-1] of Integer;
    LRadius2, LStride, LStart, LChannel, LLeft, LRight, LBottom, LTop, LRow, LColumn: Integer;
  begin
    LScan1 := AData1.Scan0;
    LScan2 := AData2.Scan0;
    LRadius2 := (2 * ARadius) + 1;
    LStride := AData1.Stride;
    for LRow := 0 to AHeight-1 do
    begin
      LStart := LRow * LStride;
      LLeft := LStart;
      LRight := LStart + ARadius * CHANNELS;
      for LChannel := 0 to CHANNELS-1 do
      begin
        LFirstValue[LChannel] := LScan1[LStart + LChannel];
        LLastValue[LChannel] := LScan1[LStart + ((AWidth - 1) * CHANNELS) + LChannel];
        LSum[LChannel] := (ARadius + 1) * LFirstValue[LChannel];
      end;
      for LColumn := 0 to ARadius-1 do
        for LChannel := 0 to CHANNELS-1 do
          LSum[LChannel] := LSum[LChannel] + LScan1[LStart + (LColumn * CHANNELS) + LChannel];
      for LColumn := 0 to ARadius do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LScan1[LRight + LChannel] - LFirstValue[LChannel];
          LScan2[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LRight, CHANNELS);
        Inc(LStart, CHANNELS);
      end;
      for LColumn := ARadius + 1 to AWidth-ARadius-1 do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LScan1[LRight + LChannel] - LScan1[LLeft + LChannel];
          LScan2[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LLeft, CHANNELS);
        Inc(LRight, CHANNELS);
        Inc(LStart, CHANNELS);
      end;
      for LColumn := AWidth-ARadius to AWidth-1 do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LLastValue[LChannel] - LScan1[LLeft + LChannel];
          LScan2[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LLeft, CHANNELS);
        Inc(LStart, CHANNELS);
      end;
    end;
    LStride := AData2.Stride;
    for LColumn := 0 to AWidth-1 do
    begin
      LStart := LColumn * CHANNELS;
      LTop := LStart;
      LBottom := LStart + (ARadius * LStride);
      for LChannel := 0 to CHANNELS-1 do
      begin
        LFirstValue[LChannel] := LScan2[LStart + LChannel];
        LLastValue[LChannel] := LScan2[LStart + ((AHeight - 1) * LStride) + LChannel];
        LSum[LChannel] := (ARadius + 1) * LFirstValue[LChannel];
      end;
      for LRow := 0 to ARadius-1 do
        for LChannel := 0 to CHANNELS-1 do
          LSum[LChannel] := LSum[LChannel] + LScan2[LStart + (LRow * LStride) + LChannel];
      for LRow := 0 to ARadius do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LScan2[LBottom + LChannel] - LFirstValue[LChannel];
          LScan1[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LBottom, LStride);
        Inc(LStart, LStride);
      end;
      for LRow := ARadius + 1 to AHeight - ARadius - 1 do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LScan2[LBottom + LChannel] - LScan2[LTop + LChannel];
          LScan1[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LTop, LStride);
        Inc(LBottom, LStride);
        Inc(LStart, LStride);
      end;
      for LRow := AHeight - ARadius to AHeight-1 do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LLastValue[LChannel] - LScan2[LTop + LChannel];
          LScan1[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LTop, LStride);
        Inc(LStart, LStride);
      end;
    end;
  end;

const
  INITIAL_MATRIX: array [0..4, 0..4] of Single =
   ((0.5,   0,   0, 0, 0),
    (0,   0.5,   0, 0, 0),
    (0,     0, 0.5, 0, 0),
    (0,     0,   0, 1, 0),
    (0,     0,   0, 0, 1));
var
  LMatrix: TGPColorMatrix;
  LImageAttributes: IGPImageAttributes;
  LShadow, LClone: IGPBitmap;
  LGraphics: IGPGraphics;
  LShadowData, LCloneData: TGPBitmapData;
  LColor: TGPColor;
begin
  ARadius := Max(ARadius, 0);
  LShadow := TGPBitmap.Create(ABitmap.Width + (4 * Cardinal(ARadius)),
    ABitmap.Height + (4 * Cardinal(ARadius)), PixelFormat32bppARGB);
  LGraphics := TGPGraphics.FromImage(LShadow);
  LGraphics.DrawImage(ABitmap, TGPRect.Create(2 * ARadius, 2 * ARadius,
    ABitmap.Width, ABitmap.Height), 0, 0, ABitmap.Width, ABitmap.Height,
    TGPUnit.UnitPixel);
  LClone := LShadow.Clone;
  LShadowData := LShadow.LockBits(TGPRect.Create(0, 0, LShadow.Width, LShadow.Height),
    [ImageLockModeRead, ImageLockModeWrite], PixelFormat32bppARGB);
  LCloneData := LClone.LockBits(TGPRect.Create(0, 0, LClone.Width, LClone.Height),
    [ImageLockModeRead, ImageLockModeWrite], PixelFormat32bppARGB);
  try
    BoxBlur(LShadowData, LCloneData, LShadow.Width, LShadow.Height, ARadius - 1);
    BoxBlur(LShadowData, LCloneData, LShadow.Width, LShadow.Height, ARadius - 1);
    BoxBlur(LShadowData, LCloneData, LShadow.Width, LShadow.Height, ARadius);
  finally
    LShadow.UnlockBits(LShadowData);
    LClone.UnlockBits(LCloneData);
  end;
  if (AColor = clNone) and (AOpacity = 1.0) then
    Result := LShadow
  else
  begin
    LColor := TGPColor.CreateFromColorRef(ColorToRGB(AColor));
    Move(INITIAL_MATRIX[0, 0], LMatrix.M[0, 0], SizeOf(INITIAL_MATRIX));
    LMatrix.M[4, 0] := Min((Integer(LColor.R) - 127) / 127, 1.0);
    LMatrix.M[4, 1] := Min((Integer(LColor.G) - 127) / 127, 1.0);
    LMatrix.M[4, 2] := Min((Integer(LColor.B) - 127) / 127, 1.0);
    LMatrix.M[4, 3] := AOpacity-1;
    LImageAttributes := TGPImageAttributes.Create;
    LImageAttributes.SetColorMatrix(LMatrix, TGPColorMatrixFlags.ColorMatrixFlagsDefault,
      TGPColorAdjustType.ColorAdjustTypeBitmap);
    Result := TGPBitmap.Create(LShadow.Width, LShadow.Height, PixelFormat32bppARGB);
    LGraphics := TGPGraphics.FromImage(Result);
    LGraphics.DrawImage(LShadow, TGPRect.Create(0, 0, LShadow.Width, LShadow.Height),
      0, 0, Result.Width, Result.Height, TGPUnit.UnitPixel, LImageAttributes);
  end;
end;

1
如果您只关注纯性能,也可以考虑仅卷积源图像的矩形边缘条。这样,您就不会花费时间在图像的中心(隐藏)部分上进行卷积,而只会在有机会在屏幕上绘制的部分上进行卷积。

0

我知道逐像素操作会比较慢,但从未进行过基准测试;70倍似乎很多,超出了我的预期。也许你使用的是托管语言,这可能会导致VM开销最大化。你尝试过将程序的这一部分编写为本地代码吗?这个链接提供了一个本地实现,你可以用来进行快速测试:

http://www.codeproject.com/KB/GDI/Glow_and_Shadow_effects.aspx

不幸的是,它们唯一的区别就是使用能够生成本地代码的语言,但它们仍然使用双层循环来访问像素。如果您可以假设应用程序运行的机器具有CUDA硬件,则最好使用CUDA。但在这种情况下,您将不再使用GDI+。无论如何,也许这个其他的SO问题会有所帮助:

使用图形卡而不是GDI+进行图像处理


我不在托管代码中:它是GDI+(即本地代码)。此外,70倍的速度差异是每像素乘法的表现 - 而不是什么都不做(没有像素效果)。 - Ian Boyd
抱歉,由于您的代码是在 Delphi 中编写的,我以为它是针对 .net 的 Delphi。 - Fabio Ceconello
1
关于像素级别的操作,我认为问题部分原因也在于频繁访问非连续的内存位置,每次都迫使处理器清空其缓存。也许重新排列会有帮助。 - Fabio Ceconello
无论如何,GDI+都是本地API。.NET具有管理的类,可以包装未经管理的平面C GDI+ API。 - Ian Boyd
我理解了,但这是否意味着当您获得有关像素操作的基准时,其中包括对GDI + API的调用时间?我的理解是瓶颈仅在像素转换中,不涉及GDI调用。 - Fabio Ceconello
不,模糊算法中没有GDI或GDI+调用。我有一块内存,以32位ARGB布局。由于这是一个本地应用程序,我直接访问内存(即通过指针)。最终我的算法并不重要,我正在寻找好的算法。它甚至不需要涉及获取alpha通道和应用模糊... - Ian Boyd

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接