在Android上，渲染脚本渲染速度比OpenGL渲染慢得多。

Question

在Android上，渲染脚本渲染速度比OpenGL渲染慢得多。

8

背景:

我想根据Android相机应用程序的代码添加实时滤镜。但是，Android相机应用程序的架构基于OpenGL ES 1.x。我需要使用着色器来自定义我们的滤镜实现。但是，将相机应用程序更新为OpenGL ES 2.0太困难了。因此，我必须找到其他方法来实现实时滤镜而不是OpenGL。经过一些研究，我决定使用RenderScript。

问题:

我已经编写了一个使用RenderScript的简单滤镜演示。结果显示它的帧率比通过OpenGL实现要低得多。大约是5fps对15fps。

问题如下:

Android官方网站表示：RenderScript运行时将并行处理所有可用设备上的处理器，例如多核CPU、GPU或DSP，使您可以专注于表达算法而不是调度工作或负载平衡。那么为什么RenderScript实现速度较慢?
如果RenderScript不能满足我的需求，有更好的方法吗？

代码细节:

嗨，我和提问者在同一个团队。我们想编写基于RenderScript的实时滤镜相机。在我们的测试演示项目中，我们使用了一个简单的滤镜：一个YuvToRGB IntrinsicScript，加上一个overlay-filter ScriptC脚本。在OpenGL版本中，我们将相机数据设置为纹理，并使用着色器进行图像滤镜处理。像这样:

    GLES20.glActiveTexture(GLES20.GL_TEXTURE0);
    GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, textureYHandle);
    GLES20.glUniform1i(shader.uniforms.get("uTextureY"), 0);
    GLES20.glTexSubImage2D(GLES20.GL_TEXTURE_2D, 0, 0, 0, mTextureWidth,
            mTextureHeight, GLES20.GL_LUMINANCE, GLES20.GL_UNSIGNED_BYTE,
            mPixelsYBuffer.position(0));

在RenderScript版本中，我们将相机数据设置为Allocation，并使用脚本内核进行图像过滤处理。就像这样：

    // The belowing code is from onPreviewFrame(byte[] data, Camera camera) which gives the camera frame data 
    byte[] imageData = datas[0];
    long timeBegin = System.currentTimeMillis();
    mYUVInAllocation.copyFrom(imageData);

    mYuv.setInput(mYUVInAllocation);
    mYuv.forEach(mRGBAAllocationA);
    // To make sure the process of YUVtoRGBA has finished!
    mRGBAAllocationA.copyTo(mOutBitmap);    
    Log.e(TAG, "RS time: YUV to RGBA : " + String.valueOf((System.currentTimeMillis() - timeBegin)));   

    mLayerScript.forEach_overlay(mRGBAAllocationA, mRGBAAllocationB);
    mRGBAAllocationB.copyTo(mOutBitmap);    
    Log.e(TAG, "RS time: overlay : " + String.valueOf((System.currentTimeMillis() - timeBegin)));

    mCameraSurPreview.refresh(mOutBitmap, mCameraDisplayOrientation, timeBegin);

这里有两个问题： (1) RenderScript进程似乎比OpenGL进程慢。 (2) 根据我们的时间日志，使用内置脚本的YUV到RGBA过程非常快，大约需要6毫秒;但使用scriptC的叠加过程非常缓慢，大约需要180毫秒。这是怎么回事？

以下是我们使用的ScriptC的rs-kernal代码(mLayerScript):

#pragma version(1)
#pragma rs java_package_name(**.renderscript)
#pragma stateFragment(parent)

#include "rs_graphics.rsh"

static rs_allocation layer;
static uint32_t dimX;
static uint32_t dimY;

void setLayer(rs_allocation layer1) {
    layer = layer1;
}

void setBitmapDim(uint32_t dimX1, uint32_t dimY1) {
    dimX = dimX1;
    dimY = dimY1;
}

static float BlendOverlayf(float base, float blend) {
    return (base < 0.5 ? (2.0 * base * blend) : (1.0 - 2.0 * (1.0 - base) * (1.0 - blend)));
}

static float3 BlendOverlay(float3 base, float3 blend) {
    float3 blendOverLayPixel = {BlendOverlayf(base.r, blend.r), BlendOverlayf(base.g, blend.g), BlendOverlayf(base.b, blend.b)};
    return blendOverLayPixel;
}

uchar4 __attribute__((kernel)) overlay(uchar4 in, uint32_t x, uint32_t y) {
    float4 inPixel = rsUnpackColor8888(in);

    uint32_t layerDimX = rsAllocationGetDimX(layer);
    uint32_t layerDimY = rsAllocationGetDimY(layer);

    uint32_t layerX = x * layerDimX / dimX;
    uint32_t layerY = y * layerDimY / dimY;

    uchar4* p = (uchar4*)rsGetElementAt(layer, layerX, layerY);
    float4 layerPixel = rsUnpackColor8888(*p);

    float3 color = BlendOverlay(inPixel.rgb, layerPixel.rgb);

    float4 outf = {color.r, color.g, color.b, inPixel.a};
    uchar4 outc = rsPackColorTo8888(outf.r, outf.g, outf.b, outf.a);

    return outc;
}

- James Zhao

你能分享一下这两个版本之间的代码有什么不同吗？我怀疑问题出在将相机数据传输到RS上。 - R. Jason Sams

@R.JasonSams 感谢您的回复。我已编辑了我的问题，并添加了一些代码。 - James Zhao

3

不要使用rsAllocationGetDimX，将其作为全局变量传递（例如dimX和dimY）。
不要忘记在常量上加上f后缀。您现在正在使用双精度。
使用rsGetElementAt_uchar4而不是rsGetElementAt。
不要包括rs_graphics.rsh，这是不必要的。
考虑将layerDimX / DimX作为全局变量缓存（与Y相同）。
尝试使用#pragma rs_fp_relaxed，如果您不关心严格的IEEE-754兼容性，则可以启用一些额外的优化（NEON和一些GPU需要放宽）。

- Tim Murray

1

Tim得到了大部分高分，如果您不需要范围重新调整（0-255 vs 0-1），则也可以使用convert_uchar4()和convert_float4()，而不是rsPackColorTo8888()。 - R. Jason Sams

谢谢，Tim和Jason。我们会尝试根据你们的建议修改我们的代码。你们有关于Renderscript代码优化的文章吗？我们在谷歌上搜索这样的文章有些困难。 - James Zhao

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ClayMontgomery · Accepted Answer

Renderscript不使用任何GPU或DSP核心。这是一种常见的误解，由Google故意含糊其文档所鼓励。Renderscript曾经有一个与OpenGL ES的接口，但已被弃用，并且从未被用于除了动态壁纸之外的其他用途。如果可用，Renderscript将使用多个CPU核心，但我怀疑Renderscript将被OpenCL所取代。

请查看Android SDK中的Effects类和Effects演示。它显示如何使用OpenGL ES 2.0着色器对图像应用效果，而无需编写OpenGL ES代码。

http://software.intel.com/en-us/articles/porting-opengl-games-to-android-on-intel-atom-processors-part-1

更新：

当我在回答问题时，比提问更多地了解问题是很好的事情，这在这里就是这种情况。从缺少回答可以看出，由于Renderscript忽略了像OpenCL这样的行业标准，并且几乎没有关于它实际如何工作的文档，因此它在Google之外几乎不被使用，其架构非常奇怪。尽管如此，我的答案确实引起了Renderscrpt开发团队的罕见反应，其中包括仅有一个链接，实际上包含有关renderscript的任何有用信息 - 这是由IMG的PowerVR GPU供应商Alexandru Voica撰写的文章。

http://withimagination.imgtec.com/index.php/powervr/running-renderscript-efficiently-with-powervr-gpus-on-android

那篇文章提供了一些对我来说新的有用信息。还有更多人在那里发布评论，他们遇到了在GPU上运行Renderscript代码的困难。

但是，我错误地认为Renderscript在Google不再开发。尽管我的说法“Renderscript不使用任何GPU或DSP核心”直到最近仍然正确，但我已经了解到这一点在Jelly Bean版本中已经改变了。如果Renderscript开发人员之一能够解释一下就好了。甚至如果他们有一个公共网页来解释这个问题或列出实际支持哪些GPU以及如何确定您的代码是否实际在GPU上运行，那就更好了。

我的观点是，Google最终将用OpenCL取代Renderscript，我不会投入时间去开发它。