Cuda和OpenGL互操作

Question

Cuda和OpenGL互操作

21

我一直在阅读CUDA文档，看起来每个需要与OpenGL交互的缓冲区都需要在glBuffer中创建。

根据Nvidia编程指南，这必须按照以下方式完成：

GLuint positionsVBO;
struct cudaGraphicsResource* positionsVBO_CUDA;

int main() {

    // Explicitly set device
    cudaGLSetGLDevice(0);
    // Initialize OpenGL and GLUT
    ...
    glutDisplayFunc(display);
    // Create buffer object and register it with CUDA
    glGenBuffers(1, positionsVBO);
    glBindBuffer(GL_ARRAY_BUFFER, &vbo);
    unsigned int size = width * height * 4 * sizeof(float);
    glBufferData(GL_ARRAY_BUFFER, size, 0, GL_DYNAMIC_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    cudaGraphicsGLRegisterBuffer(&positionsVBO_CUDA, positionsVBO, cudaGraphicsMapFlagsWriteDiscard);

    // Launch rendering loop
    glutMainLoop();
}
void display() {
    // Map buffer object for writing from CUDA
    float4* positions;
    cudaGraphicsMapResources(1, &positionsVBO_CUDA, 0);
    size_t num_bytes;
    cudaGraphicsResourceGetMappedPointer((void**)&positions, &num_bytes, positionsVBO_CUDA));
    // Execute kernel
    dim3 dimBlock(16, 16, 1);
    dim3 dimGrid(width / dimBlock.x, height / dimBlock.y, 1);
    createVertices<<<dimGrid, dimBlock>>>(positions, time, width, height);
    // Unmap buffer object
    cudaGraphicsUnmapResources(1, &positionsVBO_CUDA, 0);
    // Render from buffer object
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    glBindBuffer(GL_ARRAY_BUFFER, positionsVBO);
    glVertexPointer(4, GL_FLOAT, 0, 0);
    glEnableClientState(GL_VERTEX_ARRAY);
    glDrawArrays(GL_POINTS, 0, width * height);
    glDisableClientState(GL_VERTEX_ARRAY);
    // Swap buffers
    glutSwapBuffers();
    glutPostRedisplay();
}
void deleteVBO() {
    cudaGraphicsUnregisterResource(positionsVBO_CUDA);
    glDeleteBuffers(1, &positionsVBO);
}

__global__ void createVertices(float4* positions, float time, unsigned int width, unsigned int height) { 
    // [....]
}

有没有办法将由cudaMalloc创建的内存空间直接提供给OpenGL？我已经编写了使用cuda的可工作代码，我想直接将我的float4数组放入OpenGL中。

假设您已经有以下代码：

float4 *cd = (float4*) cudaMalloc(elements*sizeof(float4)). 
do_something<<<16,1>>>(cd);

我想通过OpenGL显示do_something的输出。

顺便说一下：为什么每个时间步都要运行cudaGraphicsResourceGetMappedPointer函数？

- Pascal

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- harrism · Accepted Answer

从CUDA 4.0开始，OpenGL互操作是单向的。这意味着要完成您想要的操作(运行写入数据到GL缓冲区或纹理图像的CUDA内核)，您需要将缓冲区映射到设备指针，并将该指针传递给您的内核，就像您的示例中所示。

至于你的侧记：每次调用display()时都会调用cudaGraphicsResourceGetMappedPointer，因为每帧都会调用cudaGraphicsMapResource。每当重新映射资源时，都应该重新获取已映射指针，因为它可能已更改。为什么每帧都要重新映射？嗯，出于性能原因(特别是在内存密集型GL应用程序中)，OpenGL有时会将缓冲对象移动到内存中。如果一直保持资源映射，它就无法这样做，性能可能会受到影响。我相信GL虚拟化内存对象的能力和需求也是目前GL互操作API单向的原因(GL不允许移动CUDA分配的内存，因此无法将CUDA分配的设备指针映射到GL缓冲区对象)。