能否从perf.data文件生成perf-stat结果？

Question

能否从perf.data文件生成perf-stat结果？

linuxperformanceprofilingperformancecounterperf

10

当我想使用Linux工具套件perf中的perf-stat和perf-report生成性能报告时，我运行以下命令：

$ perf record -o my.perf.data myCmd
$ perf report -i my.perf.data

并且：

$ perf stat myCmd

但这意味着我要再次运行'myCmd'，这需要几分钟时间。相反，我希望能够实现以下目标:

$ perf stat -i my.perf.data

但与perf suite中的大多数工具不同，我没有看到perf-stat的-i选项。是否有另一种工具可以实现此功能，或者是否有一种方法可以让perf-report生成类似于perf-stat的输出？

- garious

3

你是否已经找到解决问题的方法？请问。 - Mohamad Ibrahim

3个回答

3

“perf stat”命令无法解析“perf.data”文件，但您可以使用“perf report”命令并加上选项“--header |egrep Event\|Samples”来打印带有事件计数估计值的标题。只有在“perf.data”文件中记录的事件才会被估计。

“perf stat”命令使用硬件性能监控单元的计数模式进行计数，而通过“perf record”和“perf report”命令生成的“perf.data”文件则使用同一硬件单元进行周期性溢出模式（采样分析）。在两种模式下，硬件性能计数器都会以其控制寄存器设置一个性能事件集合（例如CPU周期或执行的指令），并且硬件会在每次事件上增加计数器的值。

在计数模式下，perf stat 使用在程序启动时初始设置为零的计数器，它们由硬件递增，并且 perf 将在程序退出时读取最终计数器值（实际上，计数将被操作系统分成几个段，具有类似的最终结果 - 对于整个程序运行的单个值）。

在剖析模式下，perf record会将每个硬件计数器设置为一些负值，例如-200000，并注册和启用溢出处理程序（实际值将由操作系统内核自动调整到某个频率）。每计数200000个事件，计数器就会从-1溢出到零，并生成一个溢出中断。 perf_events中断处理程序将“样本”（当前时间、pid、指令指针，在-g模式下还包括可选的调用堆栈）记录到环形缓冲区（由perf映射），从中读取的数据将保存到perf.data中。此处理程序还会将计数器重置为-200000。因此，经过足够长的运行时间后，将有许多样本存储在perf.data中。可以使用此样本集来生成程序的统计分析（哪些部分的程序运行更频繁）。但是，如果每个样本都是在计数200000个事件时生成的，则我们也可以获得总事件的一些估计值。由于内核的值自动调整（它试图以4000 Hz生成样本），因此估计将更加困难，请使用类似-c 1000000的东西来禁用样本周期的自动调整。

默认模式下，perf stat 显示了以下内容：对于某些 x86_64 CPU，程序的运行时间（任务时钟和经过的时间），3 个软件事件（上下文切换、CPU 迁移、页面错误）和 4 个硬件计数器：周期数、指令数、分支数和分支错失数。

$ echo '3^123456%3' | perf stat bc
0
 Performance counter stats for 'bc':
        325.604672      task-clock (msec)         #    0.998 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               181      page-faults               #    0.556 K/sec                  
       828,234,675      cycles                    #    2.544 GHz                    
     1,840,146,399      instructions              #    2.22  insn per cycle         
       348,965,282      branches                  # 1071.745 M/sec                  
        15,385,371      branch-misses             #    4.41% of all branches        
       0.326152702 seconds time elapsed

默认模式下，perf record 记录了什么？当硬件事件可用时，它是循环事件。在单个唤醒（环形缓冲区溢出）中，perf 保存了 1246 个样本到 perf.data 中。

$ echo '3^123456%3' | perf record bc
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.049 MB perf.data (1293 samples) ]

使用命令perf report --header|less、perf script和perf script -D，您可以查看perf.data的内容：

$ perf report --header |grep event
# event : name = cycles:uppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD ...
# Samples: 1K of event 'cycles:uppp'
$ perf script 2>/dev/null |grep cycles|wc -l 
1293

在perf.data文件中有一些时间戳和一些额外的程序启动和退出事件（perf script -D |egrep exec\|EXIT），但是默认情况下，perf.data中没有足够的信息来完全重建perf stat输出。运行时间仅记录为开始和退出的时间戳，以及每个事件样本的时间戳，未记录软件事件，并且只使用了单个硬件事件（循环周期; 但没有指令、分支和分支缺失）。可以对所用的硬件计数器进行近似，但不精确（实际循环周期约为820-825百万）。

$ perf report --header |grep Event
# Event count (approx.): 836622729

使用非默认记录的perf.data，可以通过perf report估计更多的事件：

$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses bc
[ perf record: Captured and wrote 0.238 MB perf.data (5164 samples) ]
$ perf report --header |egrep Event\|Samples
# Samples: 1K of event 'cycles'
# Event count (approx.): 834809036
# Samples: 1K of event 'instructions'
# Event count (approx.): 1834083643
# Samples: 1K of event 'branches'
# Event count (approx.): 347750459
# Samples: 1K of event 'branch-misses'
# Event count (approx.): 15382047

固定时间段可以使用，但如果-c选项的值太低（样本不应该每秒生成1000-4000次以上），内核可能会限制一些事件。

$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses -c 1000000 bc
$ perf report --header |egrep Event\|Samples
[ perf record: Captured and wrote 0.118 MB perf.data (3029 samples) ]
# Samples: 823  of event 'cycles'
# Event count (approx.): 823000000
# Samples: 1K of event 'instructions'
# Event count (approx.): 1842000000
# Samples: 349  of event 'branches'
# Event count (approx.): 349000000
# Samples: 15  of event 'branch-misses'
# Event count (approx.): 15000000

- osgx

1

感谢Peter Cordes。

您可以在您的命令上运行perf record覆盖perf stat。

perf record -g -o my.perf.data perf stat -o my.stat.report  myCmd

- nyttxy

我更喜欢用另一种方式来做：perf record ... perf stat ./command。perf record知道哪些样本来自perf stat，哪些来自./command，因此您仍然可以对实际命令进行分析。而且，perf stat只会看到其子进程中的事件，也就是您想要分析的那个进程，所以它的总计数将是正确的。但是，两个不同的进程是否可以从同一个进程获取硬件事件，并且PMU计数器处于不同模式下能否正常工作呢？显然，可以，至少我在perf stat中得到了类似的总计数，并且仍然可以从record中获取剖析数据。 - Peter Cordes

1

你说得对，stat无法从record中排除计数器，但是使用-g的record可以很容易地从stat中排除堆栈信息。 - nyttxy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mike Sandford · Accepted Answer

我查看了kernel.org上的源代码，似乎没有办法让perf stat解析perf.data。

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-stat.c;h=c70d72003557f17f29345b0f219dc5ca9f572d75;hb=refs/heads/linux-2.6.33.y

如果你看一下第245行，你会看到函数"run_perf_stat"，而在308-320行附近的代码似乎是实际记录和整理数据的部分。

我没有深入研究它是否可以启用你所需的功能。

看起来perf报告没有太多额外的格式化能力。您可以在这里进一步查看：

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-report.c;h=860f1eeeea7dbf8e43779308eaaffb1dbcf79d10;hb=refs/heads/linux-2.6.33.y