如何使用DirectX / Direct3D 12中的fence同步CPU和GPU?

5

我开始学习Direct3D 12,对CPU-GPU同步理解有困难。据我所知,fence(ID3D12Fence)不过是一个用作计数器的UINT64(unsigned long long)值。但它的方法使我感到困惑。以下是D3D12示例源代码的一部分。(https://github.com/d3dcoder/d3d12book

void D3DApp::FlushCommandQueue()
{
    // Advance the fence value to mark commands up to this fence point.
    mCurrentFence++;

    // Add an instruction to the command queue to set a new fence point.  Because we 
    // are on the GPU timeline, the new fence point won't be set until the GPU finishes
    // processing all the commands prior to this Signal().
    ThrowIfFailed(mCommandQueue->Signal(mFence.Get(), mCurrentFence));

    // Wait until the GPU has completed commands up to this fence point.
    if(mFence->GetCompletedValue() < mCurrentFence)
    {
        HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);

        // Fire event when GPU hits current fence.  
        ThrowIfFailed(mFence->SetEventOnCompletion(mCurrentFence, eventHandle));

        // Wait until the GPU hits current fence event is fired.
        WaitForSingleObject(eventHandle, INFINITE);
        CloseHandle(eventHandle);
    }
}

据我理解,这部分的目的是要“Flush”命令队列,即使CPU等待GPU直到它达到给定的“Fence值”,以便CPU和GPU具有相同的Fence值。
问:如果Signal()是一个函数,让GPU更新给定ID3D12Fence内的Fence值,为什么需要mCurrentFence值?
根据Microsoft文档,它说“将一个Fence更新为指定的值。”是什么指定的值?我需要的是“获取最后已完成的命令列表值”,而不是设置或指定。这个指定的值是什么?
在我看来,它似乎必须像这样:
// Suppose mCurrentFence is 1 after submitting 1 command list (Index 0), and the thread reached to here for the FIRST time
ThrowIfFailed(mCommandQueue->Signal(mFence.Get()));
// At this point Fence value inside mFence is updated
if (m_Fence->GetCompletedValue() < mCurrentFence)
{
...
}

如果m_Fence->GetCompletedValue()等于0,

如果(0 < 1)

GPU尚未执行命令列表(索引0),则CPU必须等待GPU跟进。然后才有意义调用SetEventOnCompletion,WaitForSingleObject等方法。

如果(1 < 1)

GPU已完成命令列表(索引0),因此CPU不需要等待。

在执行命令列表的某个位置增加mCurrentFence。

mCommandQueue->ExecuteCommandLists(_countof(cmdsLists), cmdsLists);
mCurrentFence++;
2个回答

4

mCommandQueue->Signal(mFence.Get(), 1)在命令队列上排队的所有先前命令被执行后,将围栏的值设置为mCurrentFence。在这种情况下,“指定值”是mCurrentFence。

开始时,围栏和mCurrentFence的值都设置为0。接下来,将mCurrentFence设置为1。然后我们执行mCommandQueue->Signal(mFence.Get(), 1),它会在该队列上执行完毕后将围栏设置为1。最后,调用mFence->SetEventOnCompletion(1, eventHandle),然后等待直到围栏被设置为1。

下一次迭代将1替换为2,以此类推。

请注意,mCommandQueue->Signal 是一项非阻塞操作,不会立即设置围栏的值,只有在所有其他 GPU 命令被执行后才会进行设置。在本例中,可以假设m_Fence->GetCompletedValue() < mCurrentFence始终为真。

为什么需要mCurrentFence值?

我想它并不一定是必需的,但通过这种方式跟踪围栏值可以避免额外的 API 调用。在这种情况下,您也可以执行:

// retrieve last value of the fence and increment by one (Additional API call)
auto nextFence = mFence->GetCompletedValue() + 1;
ThrowIfFailed(mCommandQueue->Signal(mFence.Get(), nextFence));

// Wait until the GPU has completed commands up to this fence point.
if(mFence->GetCompletedValue() < nextFence)
{
    HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);  
    ThrowIfFailed(mFence->SetEventOnCompletion(nextFence, eventHandle));
    WaitForSingleObject(eventHandle, INFINITE);
    CloseHandle(eventHandle);
}

作为一种将提交部分和等待部分分离的方法,像下面这样编码是否可行? - YoonSeok OH
void SynchronizeWithGPU() { if (mFence->GetCompletedValue() < m_nextFence) { HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);
ThrowIfFailed(mFence->SetEventOnCompletion(m_nextFence, eventHandle)); WaitForSingleObject(eventHandle, INFINITE); CloseHandle(eventHandle); } }
- YoonSeok OH
这样做似乎会提供更多的时间间隔,直到GPU执行Signal命令,因为Signal不会立即被处理,将信号部分放在mCommandQueue->ExecuteCommandLists()附近是否可行? - YoonSeok OH
对我来说看起来还不错。 - Felix Brüll

3

作为对Felix回答的补充:

跟踪围栏值(例如mCurrentFence)有助于在命令队列中等待更具体的点。

例如,假设我们正在使用以下设置:

ComPtr<ID3D12CommandQueue> queue;
ComPtr<ID3D12Fence> queueFence;
UINT64 fenceVal = 0;

UINT64 incrementFence()
{
    fenceVal++;
    queue->Signal(queueFence.Get(), fenceVal); // CHECK HRESULT
    return fenceVal;
}

void waitFor(UINT64 fenceVal, DWORD timeout = INFINITE)
{
    if (queueFence->GetCompletedValue() < fenceVal)
    {
        queueFence->SetEventOnCompletion(fenceVal, fenceEv); // CHECK HRESULT
        WaitForSingleObject(fenceEv, timeout);
    }
}

接下来我们可以按如下方式进行操作(伪代码):

SUBMIT COMMANDS 1
cmds1Complete = incrementFence();
    .
    . <- CPU STUFF
    .
SUBMIT COMMANDS 2
cmds2Complete = incrementFence();
    .
    . <- CPU STUFF
    .
waitFor(cmds1Complete)
    .
    . <- CPU STUFF (that needs COMMANDS 1 to be complete,
      but COMMANDS 2 is NOT required to be completed [but also could be])
    .
waitFor(cmds2Complete)
    .
    . <- EVERYTHING COMPLETE
    .

既然我们跟踪了 fenceVal,那么我们也可以有一个flush函数,它只等待被跟踪的 fenceVal(而不是从 incrementFence 返回的值),这就是 FlushCommandQueue 中所做的,因为它内联了信号,它总是最新的值(这就是为什么 Felix 说它只保存了一个 API 调用):

void flushCmdQueue()
{
    waitFor(incrementFence());
}

这个示例比起最初的问题要稍微复杂一些,但是我认为当询问跟踪 mCurrentFence时它很重要。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接