这个有可能吗?
不行。在这样一个高度并行的环境下,严格定义它甚至都很困难。但是,您可以使用ARB_timer_query扩展来近似计算。
struct TimerQuery
{
std::string description;
GLuint timer;
};
typedef std::deque<TimerQuery> TimerQueryQueue;
...
TimerQueryQueue timerQueryQueue;
...
void GlfwThread::beginTimerQuery(std::string description)
{
if (limiter.frame60 != 0)
return;
enqueue([this](std::string const& description) {
GLuint id;
glGenQueries(1, &id);
timerQueryQueue.push_back({ description, id });
glBeginQuery(GL_TIME_ELAPSED, id);
}, std::move(description));
}
void GlfwThread::endTimerQuery()
{
if (limiter.frame60 != 0)
return;
enqueue([this]{
glEndQuery(GL_TIME_ELAPSED);
});
}
void GlfwThread::dumpTimerQueries()
{
while (!timerQueryQueue.empty())
{
TimerQuery& next = timerQueryQueue.front();
int isAvailable = GL_FALSE;
glGetQueryObjectiv(next.timer,
GL_QUERY_RESULT_AVAILABLE,
&isAvailable);
if (!isAvailable)
return;
GLuint64 ns;
glGetQueryObjectui64v(next.timer, GL_QUERY_RESULT, &ns);
DebugMessage("timer: ",
next.description, " ",
std::fixed,
std::setprecision(3), std::setw(8),
ns / 1000.0, Stopwatch::microsecText);
glDeleteQueries(1, &next.timer);
timerQueryQueue.pop_front();
}
}
Framerate t=5.14 fps=59.94 fps_err=-0.00 aet=2850.67μs adt=13832.33μs alt=0.00μs cpu_usage=17%
instanceCount=20301 parallel_μs=2809
timer: text upload range 0.000μs
timer: clear and bind 95.200μs
timer: upload 1.056μs
timer: draw setup 1.056μs
timer: draw 281.568μs
timer: draw cleanup 1.024μs
timer: renderGlyphs 1.056μs
Framerate t=6.14 fps=59.94 fps_err=0.00 aet=2984.55μs adt=13698.45μs alt=0.00μs cpu_usage=17%
instanceCount=20361 parallel_μs=2731
timer: text upload range 0.000μs
timer: clear and bind 95.232μs
timer: upload 1.056μs
timer: draw setup 1.024μs
timer: draw 277.536μs
timer: draw cleanup 1.056μs
timer: renderGlyphs 1.024μs
Framerate t=7.14 fps=59.94 fps_err=-0.00 aet=3007.05μs adt=13675.95μs alt=0.00μs cpu_usage=18%
instanceCount=20421 parallel_μs=2800
timer: text upload range 0.000μs
timer: clear and bind 95.232μs
timer: upload 1.056μs
timer: draw setup 1.056μs
timer: draw 281.632μs
timer: draw cleanup 1.024μs
timer: renderGlyphs 1.056μs
renderThread->beginTimerQuery("draw some text");
,然后在其后立即调用renderThread->endTimerQuery();
来测量GPU执行时间。思路是,在测量部分之前向GPU命令队列发出命令,因此glBeginQuery
TIME_ELAPSED
记录了某些实现定义的计数器的值。glEndQuery
发出一个GPU命令,将当前计数和存储在TIME_ELAPSED
查询开始时的计数之间的差异存储到查询对象中。该结果由GPU存储在查询对象中,并在某个异步未来时间“可用”。我的代码保持已发出的计时器查询的队列,并每秒检查完成的测量。只要队列头部的计时器查询仍然可用,dumpTimerQueue
就会一直打印测量结果。最终,它会遇到尚不可用的计时器并停止打印消息。enqueue
内部的操作将是等效的。我从未见过这样的情况。通常,您会尽可能快地呈现一帧,进行一些 CPU 帧后处理或前处理,然后呈现下一帧,因此使用率在 0% 和 100% 之间波动。只有在很少的情况下,FPS 才会限制在最大数量,并且仅在这种情况下,这才是一个有意义的数字。