为什么cachegrind不完全确定性？

Question

为什么cachegrind不完全确定性？

7

受SQLite启发，我正在考虑使用valgrind的"cachegrind"工具来进行可重复的性能基准测试。它输出的数字比我找到的任何其他计时方法都要稳定得多，但它们仍然不是确定性的。以下是一个简单的C程序示例：

int main() {
  volatile int x;
  while (x < 1000000) {
    x++;
  }
}

如果我在cachegrind下编译并运行它，我会得到以下结果：

$ gcc -O2 x.c -o x
$ valgrind --tool=cachegrind ./x
==11949== Cachegrind, a cache and branch-prediction profiler
==11949== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.
==11949== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==11949== Command: ./x
==11949==
--11949-- warning: L3 cache found, using its data for the LL simulation.
==11949==
==11949== I   refs:      11,158,333
==11949== I1  misses:         3,565
==11949== LLi misses:         2,611
==11949== I1  miss rate:       0.03%
==11949== LLi miss rate:       0.02%
==11949==
==11949== D   refs:       4,116,700  (3,552,970 rd   + 563,730 wr)
==11949== D1  misses:        21,119  (   19,041 rd   +   2,078 wr)
==11949== LLd misses:         7,487  (    6,148 rd   +   1,339 wr)
==11949== D1  miss rate:        0.5% (      0.5%     +     0.4%  )
==11949== LLd miss rate:        0.2% (      0.2%     +     0.2%  )
==11949==
==11949== LL refs:           24,684  (   22,606 rd   +   2,078 wr)
==11949== LL misses:         10,098  (    8,759 rd   +   1,339 wr)
==11949== LL miss rate:         0.1% (      0.1%     +     0.2%  )
$ valgrind --tool=cachegrind ./x
==11982== Cachegrind, a cache and branch-prediction profiler
==11982== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.
==11982== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==11982== Command: ./x
==11982==
--11982-- warning: L3 cache found, using its data for the LL simulation.
==11982==
==11982== I   refs:      11,159,225
==11982== I1  misses:         3,611
==11982== LLi misses:         2,611
==11982== I1  miss rate:       0.03%
==11982== LLi miss rate:       0.02%
==11982==
==11982== D   refs:       4,117,029  (3,553,176 rd   + 563,853 wr)
==11982== D1  misses:        21,174  (   19,090 rd   +   2,084 wr)
==11982== LLd misses:         7,496  (    6,154 rd   +   1,342 wr)
==11982== D1  miss rate:        0.5% (      0.5%     +     0.4%  )
==11982== LLd miss rate:        0.2% (      0.2%     +     0.2%  )
==11982==
==11982== LL refs:           24,785  (   22,701 rd   +   2,084 wr)
==11982== LL misses:         10,107  (    8,765 rd   +   1,342 wr)
==11982== LL miss rate:         0.1% (      0.1%     +     0.2%  )
$

在这种情况下，“I refs”在两次运行之间仅有0.008％的差异，但我仍然想知道为什么它们不同。在更复杂的程序中（几十毫秒），它们可能会有更大的差异。是否有方法使运行完全可再生？

- Sophie Alpert

使用一个不那么复杂的CPU，例如不执行分支预测。 - David Schwartz

1

如果我理解正确，valgrind模拟自己的CPU，除非你传递--branch-sim=yes参数，否则不进行分支预测。即便如此，为什么在模拟CPU时不能进行确定性的分支预测呢？ - Sophie Alpert

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Alexis Clarembeau · Accepted Answer

在gmane.comp.debugging.valgrind话题的结尾，尼古拉斯·内瑟科特（Mozilla开发人员，Valgrind开发团队成员）表示，在使用Cachegrind时，小的变化很常见（我可以推断这不会导致重大问题）。 Cachegrind手册提到该程序非常敏感。例如，在Linux上，地址空间随机化（用于提高安全性）可能是非确定性的来源。

另一个值得注意的事情是，结果非常敏感。更改被分析的可执行文件的大小或其使用的任何共享库的大小，甚至更改它们的文件名长度都会扰动结果。变化很小，但如果您的程序发生任何变化，则不要期望完全可重复的结果。最近的GNU/Linux发行版确实采用了地址空间随机化作为安全措施，其中相同程序的相同运行在不同位置加载其共享库，这也会扰乱结果。虽然这些因素意味着您不应该信任结果是超级精确的，但它们应该足够接近以便于使用。