如何在OpenGL/WebGL中减少绘制调用

6

当我阅读有关OpenGL/WebGL性能的内容时,我几乎总是听说要减少绘制调用。因此,我的问题是我只使用4个顶点来绘制一个贴图四边形。这通常意味着我的VBO只包含4个顶点。

gl.bindBuffer(gl.ARRAY_BUFFER,vbo);
gl.uniformMatrix4fv(matrixLocation, false, modelMatrix);
gl.drawArrays(gl.TRIANGLE_FAN,0, vertices.length/3);

我看到的问题在于,在绘制前,我需要更新当前四边形的模型矩阵。例如,将其沿 y 轴移动 5 个单位。

所以我需要做的是:

gl.bindBuffer(gl.ARRAY_BUFFER,vbo);
gl.uniformMatrix4fv(matrixLocation, false, modelMatrix);
gl.drawArrays(gl.TRIANGLE_FAN, 0, vertices.length/3);

gl.uniformMatrix4fv(matrixLocation, false, anotherModelMatrix);
gl.drawArrays(gl.TRIANGLE_FAN,0, vertices.length/3);
....// repeat until all textures are rendered

我该如何减少绘制调用呢?甚至将其减少到仅一个绘制调用。


将您想要渲染的所有四边形的世界空间位置烘焙到单个缓冲区中(或更多,取决于场景的可见性)。 - pleluron
1个回答

16

第一个问题是,这重要吗?

如果你的绘制调用少于1000次,甚至2000次,那么可能并不重要。易于使用比大多数其他解决方案更为重要。

如果你真的需要大量的四边形,则有很多解决方案。其中一个解决方案是将N个四边形放入单个缓冲区中。请参见此演示文稿。然后将位置、旋转和缩放放入其他缓冲区或纹理中,并在着色器内计算矩阵。

换句话说,对于带纹理的四边形,人们通常将顶点位置和纹理坐标按以下顺序放入缓冲区中。

p0, p1, p2, p3, p4, p5,   // buffer for positions for 1 quad
t0, t1, t2, t3, t4, t5,   // buffer for texcoord for 1 quad

相反,你会这样做

p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, ...  // positions for N quads
t0, t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, ...  // texcoords for N quads

p0 - p5 是单位正方形值,p6 - p11 是相同的值,p12 - p17 再次是相同的值。t0 - t5 是单位纹理坐标值,t6 - t11 是相同的纹理坐标值。等等。

然后您可以添加更多缓冲区。假设我们只想要世界位置和比例。因此,我们再添加2个缓冲区。

s0, s0, s0, s0, s0, s0, s1, s1, s1, s1, s1, s1, s2, ...  // scales for N quads
w0, w0, w0, w0, w0, w0, w1, w1, w1, w1, w1, w1, w2, ...  // world positions for N quads

注意到缩放比例重复了6次,每个顶点在第一个四边形中都有一次。然后它再次为下一个四边形重复6次,以此类推。世界位置也是如此。这样,单个四边形的所有6个顶点共享相同的世界位置和相同的缩放比例。
现在在着色器中,我们可以像这样使用它们。
attribute vec3 position;
attribute vec2 texcoord;
attribute vec3 worldPosition;
attribute vec3 scale;

uniform mat4 view;    // inverse of camera
uniform mat4 camera;  // inverse of view
uniform mat4 projection;

varying vec2 v_texcoord;

void main() {
   // Assuming we want billboards (quads that always face the camera)
   vec3 localPosition = (camera * vec4(position * scale, 0)).xyz;

   // make quad points at the worldPosition
   vec3 worldPos = worldPosition + localPosition;

   gl_Position = projection * view * vec4(worldPos, 1);

   v_texcoord = texcoord; // pass on texcoord to fragment shader
}

现在,每当我们想要设置一个四边形的位置时,我们需要在相应的缓冲区中设置6个世界位置(每个顶点一个)。
通常情况下,您可以更新所有世界位置,然后调用一次gl.bufferData来上传所有世界位置。
这里有100,000个四边形。

const vs = `
attribute vec3 position;
attribute vec2 texcoord;
attribute vec3 worldPosition;
attribute vec2 scale;

uniform mat4 view;    // inverse of camera
uniform mat4 camera;  // inverse of view
uniform mat4 projection;

varying vec2 v_texcoord;

void main() {
   // Assuming we want billboards (quads that always face the camera)
   vec3 localPosition = (camera * vec4(position * vec3(scale, 1), 0)).xyz;

   // make quad points at the worldPosition
   vec3 worldPos = worldPosition + localPosition;

   gl_Position = projection * view * vec4(worldPos, 1);

   v_texcoord = texcoord; // pass on texcoord to fragment shader
}
`;

const fs = `
precision mediump float;
varying vec2 v_texcoord;
uniform sampler2D texture;
void main() {
  gl_FragColor = texture2D(texture, v_texcoord);
}
`;

const m4 = twgl.m4;
const gl = document.querySelector("canvas").getContext("webgl");

// compiles and links shaders and looks up locations
const programInfo = twgl.createProgramInfo(gl, [vs, fs]);

const numQuads = 100000;
const positions = new Float32Array(numQuads * 6 * 2);
const texcoords = new Float32Array(numQuads * 6 * 2);
const worldPositions = new Float32Array(numQuads * 6 * 3);
const basePositions = new Float32Array(numQuads * 3); // for JS
const scales = new Float32Array(numQuads * 6 * 2);
const unitQuadPositions = [
   -.5, -.5, 
    .5, -.5,
   -.5,  .5,
   -.5,  .5,
    .5, -.5,
    .5,  .5,
];
const unitQuadTexcoords = [
    0, 0,
    1, 0,
    0, 1,
    0, 1,
    1, 0,
    1, 1,
];

for (var i = 0; i < numQuads; ++i) {
  const off3 = i * 6 * 3;
  const off2 = i * 6 * 2;
  
  positions.set(unitQuadPositions, off2);
  texcoords.set(unitQuadTexcoords, off2);
  const worldPos = [rand(-100, 100), rand(-100, 100), rand(-100, 100)];
  const scale = [rand(1, 2), rand(1, 2)];
  basePositions.set(worldPos, i * 3);
  for (var j = 0; j < 6; ++j) {
    worldPositions.set(worldPos, off3 + j * 3);
    scales.set(scale, off2 + j * 2);
  }
}

const tex = twgl.createTexture(gl, {
  src: "http://i.imgur.com/weklTat.gif",
  crossOrigin: "",
  flipY: true,
});

// calls gl.createBuffer, gl.bufferData
const bufferInfo = twgl.createBufferInfoFromArrays(gl, {
  position: { numComponents: 2, data: positions, },
  texcoord: { numComponents: 2, data: texcoords, },
  worldPosition: { numComponents: 3, data: worldPositions, },
  scale: { numComponents: 2, data: scales, },
});

function render(time) {
   time *= 0.001; // seconds
   
   twgl.resizeCanvasToDisplaySize(gl.canvas);
   
   gl.viewport(0, 0, gl.canvas.width, gl.canvas.height);
   gl.enable(gl.DEPTH_TEST);
   
   gl.useProgram(programInfo.program);
   
   // calls gl.bindBuffer, gl.enableVertexAttribArray, gl.vertexAttribPointer
   twgl.setBuffersAndAttributes(gl, programInfo, bufferInfo);
   
   const fov = Math.PI * .25;
   const aspect = gl.canvas.clientWidth / gl.canvas.clientHeight;
   const zNear = .1;
   const zFar = 200;
   const projection = m4.perspective(fov, aspect, zNear, zFar);
   
   const radius = 100;
   const tm = time * .1
   const eye = [Math.sin(tm) * radius, Math.sin(tm * .9) * radius, Math.cos(tm) * radius];
   const target = [0, 0, 0];
   const up = [0, 1, 0];
   const camera = m4.lookAt(eye, target, up);
   const view = m4.inverse(camera);
   
   // calls gl.uniformXXX
   twgl.setUniforms(programInfo, { 
     texture: tex,
     view: view,
     camera: camera,
     projection: projection,
   });
   
   // update all the worldPositions
   for (var i = 0; i < numQuads; ++i) {
     const src = i * 3;
     const dst = i * 6 * 3;
     for (var j = 0; j < 6; ++j) {
       const off = dst + j * 3;
       worldPositions[off + 0] = basePositions[src + 0] + Math.sin(time + i) * 10;
       worldPositions[off + 1] = basePositions[src + 1] + Math.cos(time + i) * 10;
       worldPositions[off + 2] = basePositions[src + 2];
     }
   }
   
   // upload them to the GPU
   gl.bindBuffer(gl.ARRAY_BUFFER, bufferInfo.attribs.worldPosition.buffer);
   gl.bufferData(gl.ARRAY_BUFFER, worldPositions, gl.DYNAMIC_DRAW);
   
   // calls gl.drawXXX
   twgl.drawBufferInfo(gl, bufferInfo);
   
   requestAnimationFrame(render);
}
requestAnimationFrame(render);

function rand(min, max) {
  if (max === undefined) {
     max = min;
     min = 0;
  }
  return Math.random() * (max - min) + min;
}
body { margin: 0; }
canvas { width: 100vw; height: 100vh; display: block; }
<script src="https://twgljs.org/dist/3.x/twgl-full.min.js"></script>
<canvas />

你可以使用ANGLE_instance_arrays扩展将重复顶点的数量从6个减少到1个。虽然速度不如上述技术快,但它非常接近。
你还可以通过在纹理中存储世界位置和比例来将数据量从6个减少到1个。在这种情况下,您只需添加一个额外的缓冲区,其中仅包含重复的ID,而不是2个额外的缓冲区。
// id buffer
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3 ....

id会重复6次,每个四边形的6个顶点各一次。

然后使用该id计算纹理坐标以查找世界位置和比例。

attribute float id;
...

uniform sampler2D worldPositionTexture;  // texture with world positions
uniform vec2 textureSize;               // pass in the texture size

...

  // compute the texel that contains our world position
  vec2 texel = vec2(
     mod(id, textureSize.x),
     floor(id / textureSize.x));

  // compute the UV coordinate to access that texel
  vec2 uv = (texel + .5) / textureSize;

  vec3 worldPosition = texture2D(worldPositionTexture, uv).xyz;

现在您需要将世界位置放入纹理中,您可能希望使用浮点纹理以使其更容易。您可以对比例等进行类似的操作,并将每个值存储在单独的纹理中,或者全部存储在同一纹理中并相应地更改您的uv计算。

const vs = `
attribute vec3 position;
attribute vec2 texcoord;
attribute float id;

uniform sampler2D worldPositionTexture;  
uniform sampler2D scaleTexture;          
uniform vec2 textureSize;  // texture are same size so only one size needed
uniform mat4 view;    // inverse of camera
uniform mat4 camera;  // inverse of view
uniform mat4 projection;

varying vec2 v_texcoord;

void main() {
  // compute the texel that contains our world position
  vec2 texel = vec2(
     mod(id, textureSize.x),
     floor(id / textureSize.x));

  // compute the UV coordinate to access that texel
  vec2 uv = (texel + .5) / textureSize;

  vec3 worldPosition = texture2D(worldPositionTexture, uv).xyz;
  vec2 scale = texture2D(scaleTexture, uv).xy;

  // Assuming we want billboards (quads that always face the camera)
  vec3 localPosition = (camera * vec4(position * vec3(scale, 1), 0)).xyz;

  // make quad points at the worldPosition
  vec3 worldPos = worldPosition + localPosition;

  gl_Position = projection * view * vec4(worldPos, 1);

  v_texcoord = texcoord; // pass on texcoord to fragment shader
}
`;

const fs = `
precision mediump float;
varying vec2 v_texcoord;
uniform sampler2D texture;
void main() {
  gl_FragColor = texture2D(texture, v_texcoord);
}
`;

const m4 = twgl.m4;
const gl = document.querySelector("canvas").getContext("webgl");
const ext = gl.getExtension("OES_texture_float");
if (!ext) {
  alert("Doh! requires OES_texture_float extension");
}
if (gl.getParameter(gl.MAX_VERTEX_TEXTURE_IMAGE_UNITS) < 2) {
  alert("Doh! need at least 2 vertex texture image units");
}

// compiles and links shaders and looks up locations
const programInfo = twgl.createProgramInfo(gl, [vs, fs]);

const numQuads = 50000;
const positions = new Float32Array(numQuads * 6 * 2);
const texcoords = new Float32Array(numQuads * 6 * 2);
const ids = new Float32Array(numQuads * 6);
const basePositions = new Float32Array(numQuads * 3); // for JS
// we need to pad these because textures have to rectangles
const size = roundUpToNearest(numQuads * 4, 1024 * 4)
const worldPositions = new Float32Array(size);
const scales = new Float32Array(size);
const unitQuadPositions = [
   -.5, -.5, 
    .5, -.5,
   -.5,  .5,
   -.5,  .5,
    .5, -.5,
    .5,  .5,
];
const unitQuadTexcoords = [
    0, 0,
    1, 0,
    0, 1,
    0, 1,
    1, 0,
    1, 1,
];

for (var i = 0; i < numQuads; ++i) {
  const off2 = i * 6 * 2;
  const off4 = i * 4;
  
  // you could even put these in a texture OR you can even generate
  // them inside the shader based on the id. See vertexshaderart.com for
  // examples of generating positions in the shader based on id
  positions.set(unitQuadPositions, off2);
  texcoords.set(unitQuadTexcoords, off2);
  ids.set([i, i, i, i, i, i], i * 6);

  const worldPos = [rand(-100, 100), rand(-100, 100), rand(-100, 100)];
  const scale = [rand(1, 2), rand(1, 2)];
  basePositions.set(worldPos, i * 3);
    
  for (var j = 0; j < 6; ++j) {  
    worldPositions.set(worldPos, off4 + j * 4);    
    scales.set(scale, off4 + j * 4);
  }
}

const tex = twgl.createTexture(gl, {
  src: "http://i.imgur.com/weklTat.gif",
  crossOrigin: "",
  flipY: true,
});

const worldPositionTex = twgl.createTexture(gl, {
  type: gl.FLOAT,
  src: worldPositions,
  width: 1024,
  minMag: gl.NEAREST,
  wrap: gl.CLAMP_TO_EDGE,
});

const scaleTex = twgl.createTexture(gl, {
  type: gl.FLOAT,
  src: scales,
  width: 1024,
  minMag: gl.NEAREST,
  wrap: gl.CLAMP_TO_EDGE,
});

// calls gl.createBuffer, gl.bufferData
const bufferInfo = twgl.createBufferInfoFromArrays(gl, {
  position: { numComponents: 2, data: positions, },
  texcoord: { numComponents: 2, data: texcoords, },
  id: { numComponents: 1, data: ids, },
});

function render(time) {
   time *= 0.001; // seconds
   
   twgl.resizeCanvasToDisplaySize(gl.canvas);
   
   gl.viewport(0, 0, gl.canvas.width, gl.canvas.height);
   gl.enable(gl.DEPTH_TEST);
   
   gl.useProgram(programInfo.program);
   
   // calls gl.bindBuffer, gl.enableVertexAttribArray, gl.vertexAttribPointer
   twgl.setBuffersAndAttributes(gl, programInfo, bufferInfo);
   
   const fov = Math.PI * .25;
   const aspect = gl.canvas.clientWidth / gl.canvas.clientHeight;
   const zNear = .1;
   const zFar = 200;
   const projection = m4.perspective(fov, aspect, zNear, zFar);
   
   const radius = 100;
   const tm = time * .1
   const eye = [Math.sin(tm) * radius, Math.sin(tm * .9) * radius, Math.cos(tm) * radius];
   const target = [0, 0, 0];
   const up = [0, 1, 0];
   const camera = m4.lookAt(eye, target, up);
   const view = m4.inverse(camera);
   
   // update all the worldPositions
   for (var i = 0; i < numQuads; ++i) {
     const src = i * 3;
     const dst = i * 3;
     worldPositions[dst + 0] = basePositions[src + 0] + Math.sin(time + i) * 10;
     worldPositions[dst + 1] = basePositions[src + 1] + Math.cos(time + i) * 10;
     worldPositions[dst + 2] = basePositions[src + 2];
   }
   
   // upload them to the GPU
   const width = 1024;
   const height = worldPositions.length / width / 4;
   gl.bindTexture(gl.TEXTURE_2D, worldPositionTex);
   gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, width, height, 0, gl.RGBA, gl.FLOAT, worldPositions); 
   
   // calls gl.uniformXXX, gl.activeTeture, gl.bindTexture
   twgl.setUniforms(programInfo, { 
     texture: tex,
     scaleTexture: scaleTex,
     worldPositionTexture: worldPositionTex,
     textureSize: [width, height],
     view: view,
     camera: camera,
     projection: projection,
   });
   
   // calls gl.drawXXX
   twgl.drawBufferInfo(gl, bufferInfo);
   
   requestAnimationFrame(render);
}
requestAnimationFrame(render);

function rand(min, max) {
  if (max === undefined) {
     max = min;
     min = 0;
  }
  return Math.random() * (max - min) + min;
}

function roundUpToNearest(v, round) {
  return ((v + round - 1) / round | 0) * round;
}
body { margin: 0; }
canvas { width: 100vw; height: 100vh; display: block; }
<script src="https://twgljs.org/dist/3.x/twgl-full.min.js"></script>
<canvas />

请注意,至少在我的机器上,通过纹理进行操作比通过缓冲区进行操作要慢,因此,虽然对于JavaScript来说工作量较小(每个四边形只需要更新一个worldPosition),但对于GPU来说似乎工作量更大(至少在我的机器上)。使用缓冲区版本,我可以在100k个四边形下以60fps运行,而使用纹理版本,在100k个四边形下只能以约40fps运行。我将其降低到50k,但这些数字当然只适用于我的机器。其他机器的情况可能会有所不同。
像这样的技术将允许您拥有更多的四边形,但代价是灵活性的损失。您只能以着色器中提供的方式对它们进行操作。例如,如果您想能够从不同的原点(中心、左上角、右下角等)进行缩放,则需要添加另一组数据或设置位置。如果您想旋转,则需要添加旋转数据等。
您甚至可以每个四边形传递整个矩阵,但这样每个四边形将上传16个浮点数。尽管在调用gl.uniformMatrix4fv时已经这样做了,但如果您只进行2次调用,即gl.bufferData或gl.texImage2D来上传新矩阵,然后使用gl.drawXXX绘制,则速度可能会更快。
另一个问题是您提到了纹理。如果您对每个四边形使用不同的纹理,则需要找出如何将它们转换为纹理图集(一个纹理中的所有图像),在这种情况下,您的UV坐标不会像上面那样重复。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接