在GLSL ES 1.1中实现卷积的高效邻域纹理访问

Question

在GLSL ES 1.1中实现卷积的高效邻域纹理访问

iphoneopengl-esglsl

7

我正在使用iPhone着色器GLSL ES 1.1执行3x3内核卷积。目前，我只是进行了9次纹理查找。有更快的方法吗？以下是一些想法：

将输入图像作为缓冲区传递而不是纹理，以避免调用纹理插值。
从顶点着色器传递9个varying vec2坐标（而不仅仅是一个，就像我当前所做的那样），以鼓励处理器有效地预取纹理。
研究适用于此的各种苹果扩展。
（添加）研究GLSL shaderOffset调用的ES等效项（在ES下不可用，但可能存在等效项）

就硬件而言，我特别关注iPhone 4S。

- Alex Flint

这个问题非常硬件特定。理论上，每个 iPhone 版本都可能有不同的答案。 - Nicol Bolas

我想集中讨论iPhone 4S。我现在已经更新了问题。 - Alex Flint

2个回答

0

为什么不使用分离的高斯模糊运行两次？在第一次通过垂直方向进行3次轻敲，然后在第二次通过水平方向进行3次轻敲。

- Bart Hender

2

在我的实验中，我发现多次遍历的成本超过了3x3查找的成本。 - Alex Flint

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Brad Larson · Accepted Answer

你确定不是指OpenGL ES 2.0吗？使用OpenGL ES 1.1无法进行任何类型的着色器。我会认为您是指前者。

根据我的经验，我发现你列出的第二个方法是最快的。在我的GPUImage框架中，我执行几种3x3卷积操作（您可以直接使用它而不是尝试自己实现），对于这些操作，我将水平和垂直方向的纹理偏移量传递到顶点着色器中，并计算所需的9个纹理坐标。从那里，我将它们作为varyings传递到片段着色器中。

这样做（大部分时间）避免了片段着色器中的依赖纹理读取，在iOS PowerVR GPU上效果非常好。我说“大部分时间”是因为对于像iPhone 4这样的老设备，只有8个varyings被使用来避免依赖纹理读取。就像我上周学到的那样，第九个varying触发了旧设备上的依赖纹理读取，因此会稍微减慢速度。但是，iPhone 4S没有这个问题，因为它支持在此方式下使用更多的varyings。

我在顶点着色器中使用以下内容：

 attribute vec4 position;
 attribute vec4 inputTextureCoordinate;

 uniform highp float texelWidth; 
 uniform highp float texelHeight; 

 varying vec2 textureCoordinate;
 varying vec2 leftTextureCoordinate;
 varying vec2 rightTextureCoordinate;

 varying vec2 topTextureCoordinate;
 varying vec2 topLeftTextureCoordinate;
 varying vec2 topRightTextureCoordinate;

 varying vec2 bottomTextureCoordinate;
 varying vec2 bottomLeftTextureCoordinate;
 varying vec2 bottomRightTextureCoordinate;

 void main()
 {
     gl_Position = position;

     vec2 widthStep = vec2(texelWidth, 0.0);
     vec2 heightStep = vec2(0.0, texelHeight);
     vec2 widthHeightStep = vec2(texelWidth, texelHeight);
     vec2 widthNegativeHeightStep = vec2(texelWidth, -texelHeight);

     textureCoordinate = inputTextureCoordinate.xy;
     leftTextureCoordinate = inputTextureCoordinate.xy - widthStep;
     rightTextureCoordinate = inputTextureCoordinate.xy + widthStep;

     topTextureCoordinate = inputTextureCoordinate.xy - heightStep;
     topLeftTextureCoordinate = inputTextureCoordinate.xy - widthHeightStep;
     topRightTextureCoordinate = inputTextureCoordinate.xy + widthNegativeHeightStep;

     bottomTextureCoordinate = inputTextureCoordinate.xy + heightStep;
     bottomLeftTextureCoordinate = inputTextureCoordinate.xy - widthNegativeHeightStep;
     bottomRightTextureCoordinate = inputTextureCoordinate.xy + widthHeightStep;
 }

和片段着色器：

 precision highp float;

 uniform sampler2D inputImageTexture;

 uniform mediump mat3 convolutionMatrix;

 varying vec2 textureCoordinate;
 varying vec2 leftTextureCoordinate;
 varying vec2 rightTextureCoordinate;

 varying vec2 topTextureCoordinate;
 varying vec2 topLeftTextureCoordinate;
 varying vec2 topRightTextureCoordinate;

 varying vec2 bottomTextureCoordinate;
 varying vec2 bottomLeftTextureCoordinate;
 varying vec2 bottomRightTextureCoordinate;

 void main()
 {
     mediump vec4 bottomColor = texture2D(inputImageTexture, bottomTextureCoordinate);
     mediump vec4 bottomLeftColor = texture2D(inputImageTexture, bottomLeftTextureCoordinate);
     mediump vec4 bottomRightColor = texture2D(inputImageTexture, bottomRightTextureCoordinate);
     mediump vec4 centerColor = texture2D(inputImageTexture, textureCoordinate);
     mediump vec4 leftColor = texture2D(inputImageTexture, leftTextureCoordinate);
     mediump vec4 rightColor = texture2D(inputImageTexture, rightTextureCoordinate);
     mediump vec4 topColor = texture2D(inputImageTexture, topTextureCoordinate);
     mediump vec4 topRightColor = texture2D(inputImageTexture, topRightTextureCoordinate);
     mediump vec4 topLeftColor = texture2D(inputImageTexture, topLeftTextureCoordinate);

     mediump vec4 resultColor = topLeftColor * convolutionMatrix[0][0] + topColor * convolutionMatrix[0][1] + topRightColor * convolutionMatrix[0][2];
     resultColor += leftColor * convolutionMatrix[1][0] + centerColor * convolutionMatrix[1][1] + rightColor * convolutionMatrix[1][2];
     resultColor += bottomLeftColor * convolutionMatrix[2][0] + bottomColor * convolutionMatrix[2][1] + bottomRightColor * convolutionMatrix[2][2];

     gl_FragColor = resultColor;
 }

即使有上述注意事项，使用这种着色器在iPhone 4上运行一个640x480的视频帧只需要大约2毫秒，在一个像这样的着色器下，4S可以轻松处理1080p的30 FPS视频。