更高效的混合像素的方法(半透明)?

3

我正在为一个小型2D游戏绘制半透明图像叠加在其他图像上。目前,我使用的混合图像公式可以在这里找到:https://en.wikipedia.org/wiki/Alpha_compositing#Alpha_blending

我的实现方式如下:

private static int blend(int source, int dest, int trans)
{
    double alpha = ((double) trans / 255.0);
    int sourceRed = (source >> 16 & 0xff);
    int sourceGreen = (source >> 8 & 0xff);
    int sourceBlue = (source & 0xff);
    int destRed = (dest >> 16 & 0xff);
    int destGreen = (dest >> 8 & 0xff);
    int destBlue = (dest & 0xff);

    int blendedRed = (int) (alpha * sourceRed + (1.0 - alpha) * destRed);
    int blendedGreen = (int) (alpha * sourceGreen + (1.0 - alpha) * destGreen);
    int blendedBlue = (int) (alpha * sourceBlue + (1.0 - alpha) * destBlue);

    return (blendedRed << 16) + (blendedGreen << 8) + blendedBlue;
}

现在它虽然能够正常工作,但由于每个像素每帧都需要调用它,因此开销相当大。相比于不进行混合渲染,我的性能下降了约30%的FPS。

我想知道是否有更好的优化代码的方法,因为我可能做了太多的位运算。

1个回答

1

我不是Java程序员(因此请带有偏见地阅读),但从我的C++和低级图形学角度来看,你正在做一些非常错误的事情:

  1. mixing integers and floating point

    that requires conversions which are sometimes really costly... Its much better to use integer weights (alpha) in range <0..255> and then just divide by 255 or bitshift by 8. That would be most likely much faster.

  2. bitshifting/masking to obtain bytes

    yes its fine but there are simpler and faster methods simply by using

    enum{
        _b=0,   // db
        _g=1,
        _r=2,
        _a=3,
        };
    
    union color
        {
        DWORD dd;    // 1x32 bit unsigned int
        BYTE db[4];  // 4x8 bit unsigned int
        };
    
    color col;
    col.dd=some_rgba_color;
    r = col.dd[_r]; // get red channel
    col.dd[_b]=5;   // set blue channel
    

    decent compilers could optimize some parts of your code to this internally on its own but I doubt it can do it everywhere...

    You can also use pointers instead of union in the same way...

  3. function overhead

    you got function blending single pixel. That means it will be called a lot. its usually much faster to blend region (rectangle) per single call than call stuff on per pixel basis. Because you trash the stack this way. To limit this you can try these (for functions that are called massively):

    Recode your app so you can blend regions instead of pixels causing much less function calls.

    Lower the stack trashing by lowering operands, return values and internal variables of called function to limit the amount of RAM being allocated/freed/overwritten/copied each call... For example by using static or global variables for example the Alpha will most likely not be changing much. Or you can use alpha encoded in the color directly instead of having alpha as operand.

    use inline or macros like #define to place the source code directly to code instead of function call.

首先,我建议尝试将函数主体重构为以下内容:

enum{
    _b=0,   // db
    _g=1,
    _r=2,
    _a=3,
    };

union color
    {
    unsigned int dd;    // 1x32 bit unsigned int
    unsigned char db[4];  // 4x8 bit unsigned int
    };

private static unsigned int blend(unsigned int src, unsigned int dst, unsigned int alpha)
    {
    unsigned int i,a,_alpha=255-alpha;
    color s,d;
    s.dd=src;
    d.dd=dst;
    for (i=0;i<3;i++)
        { 
        a=(((unsigned int)(s.db[i]))*alpha) + (((unsigned int)(d.db[i]))*_alpha);
        a>>=8;
        d.db[i]=a;
        }
    return d.dd;
    }

然而,如果您想要真正的速度,请使用GPU(OpenGL混合)。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接