理想情况下,我需要一个可以附加到进程并记录定期快照的应用程序,其中包括:
- 内存使用情况 - 线程数量 - CPU 使用情况
You can use top
in batch mode. It runs in the batch mode either until it is killed or until N iterations is done:
top -b -p `pidof a.out`
or
top -b -p `pidof a.out` -n 100
and you will get this:
$ top -b -p `pidof a.out`
top - 10:31:50 up 12 days, 19:08, 5 users, load average: 0.02, 0.01, 0.02
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16330584k total, 2335024k used, 13995560k free, 241348k buffers
Swap: 4194296k total, 0k used, 4194296k free, 1631880k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24402 SK 20 0 98.7m 1056 860 S 43.9 0.0 0:11.87 a.out
top - 10:31:53 up 12 days, 19:08, 5 users, load average: 0.02, 0.01, 0.02
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.9%us, 3.7%sy, 0.0%ni, 95.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16330584k total, 2335148k used, 13995436k free, 241348k buffers
Swap: 4194296k total, 0k used, 4194296k free, 1631880k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24402 SK 20 0 98.7m 1072 860 S 19.0 0.0 0:12.44 a.out
You can use ps
(for instance in a shell script)
ps --format pid,pcpu,cputime,etime,size,vsz,cmd -p `pidof a.out`
I need some means of recording the performance of an application on a Linux machine
In order to do this you need to use perf
if your Linux kernel is greater than 2.6.32 or OProfile if it is older. Both programs don't require from you to instrument your program (like Gprof requires). However, in order to get the call graph correctly in perf
you need to build you program with -fno-omit-frame-pointer. For example: g++ -fno-omit-frame-pointer -O2 main.cpp
.
perf
:
To record performance data:
perf record -p `pidof a.out`
or to record for 10 seconds:
perf record -p `pidof a.out` sleep 10
or to record with a call graph ()
perf record -g -p `pidof a.out`
To analyze the recorded data
perf report --stdio
perf report --stdio --sort=dso -g none
perf report --stdio -g none
perf report --stdio -g
On RHEL 6.3 it is allowed to read /boot/System.map-2.6.32-279.el6.x86_64, so I usually add --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64 when doing a performance report:
perf report --stdio -g --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64
First of all - this is tutorial about Linux profiling with perf
You can use perf if your Linux Kernel is greater than 2.6.32 or OProfile if it is older. Both programs don't require from you to instrument your program (like Gprof requires). However, in order to get call graph correctly in perf you need to build you program with -fno-omit-frame-pointer
. For example: g++ -fno-omit-frame-pointer -O2 main.cpp
.
You can see a "live" analysis of your application with perf top:
sudo perf top -p `pidof a.out` -K
或者您可以记录运行应用程序的性能数据,并在之后进行分析:
To record performance data:
perf record -p `pidof a.out`
or to record for 10 seconds:
perf record -p `pidof a.out` sleep 10
or to record with a call graph ()
perf record -g -p `pidof a.out`
To analyze the recorded data
perf report --stdio
perf report --stdio --sort=dso -g none
perf report --stdio -g none
perf report --stdio -g
或者,您可以通过以这种方式启动应用程序并等待其退出来记录应用程序的性能数据,然后进行分析:
perf record ./a.out
这是一个测试程序的性能分析示例。
测试程序在文件main.cpp中(main.cpp在答案底部):
我以以下方式进行编译:
g++ -m64 -fno-omit-frame-pointer -g main.cpp -L. -ltcmalloc_minimal -o my_test
我使用libmalloc_minimal.so,因为它是使用-fno-omit-frame-pointer编译的,而libc malloc似乎没有使用此选项进行编译。然后我运行我的测试程序:
./my_test 100000000
然后我记录一个正在运行的进程的性能数据:
perf record -g -p `pidof my_test` -o ./my_test.perf.data sleep 30
然后我会分析每个模块的负载:
perf report --stdio -g none --sort comm,dso -i ./my_test.perf.data
# Overhead Command Shared Object
# ........ ....... ............................
#
70.06% my_test my_test
28.33% my_test libtcmalloc_minimal.so.0.1.0
1.61% my_test [kernel.kallsyms]
然后分析每个函数的加载:
perf report --stdio -g none -i ./my_test.perf.data | c++filt
# Overhead Command Shared Object Symbol
# ........ ....... ............................ ...........................
#
29.30% my_test my_test [.] f2(long)
29.14% my_test my_test [.] f1(long)
15.17% my_test libtcmalloc_minimal.so.0.1.0 [.] operator new(unsigned long)
13.16% my_test libtcmalloc_minimal.so.0.1.0 [.] operator delete(void*)
9.44% my_test my_test [.] process_request(long)
1.01% my_test my_test [.] operator delete(void*)@plt
0.97% my_test my_test [.] operator new(unsigned long)@plt
0.20% my_test my_test [.] main
0.19% my_test [kernel.kallsyms] [k] apic_timer_interrupt
0.16% my_test [kernel.kallsyms] [k] _spin_lock
0.13% my_test [kernel.kallsyms] [k] native_write_msr_safe
and so on ...
然后调用链会被分析:
perf report --stdio -g graph -i ./my_test.perf.data | c++filt
# Overhead Command Shared Object Symbol
# ........ ....... ............................ ...........................
#
29.30% my_test my_test [.] f2(long)
|
--- f2(long)
|
--29.01%-- process_request(long)
main
__libc_start_main
29.14% my_test my_test [.] f1(long)
|
--- f1(long)
|
|--15.05%-- process_request(long)
| main
| __libc_start_main
|
--13.79%-- f2(long)
process_request(long)
main
__libc_start_main
15.17% my_test libtcmalloc_minimal.so.0.1.0 [.] operator new(unsigned long)
|
--- operator new(unsigned long)
|
|--11.44%-- f1(long)
| |
| |--5.75%-- process_request(long)
| | main
| | __libc_start_main
| |
| --5.69%-- f2(long)
| process_request(long)
| main
| __libc_start_main
|
--3.01%-- process_request(long)
main
__libc_start_main
13.16% my_test libtcmalloc_minimal.so.0.1.0 [.] operator delete(void*)
|
--- operator delete(void*)
|
|--9.13%-- f1(long)
| |
| |--4.63%-- f2(long)
| | process_request(long)
| | main
| | __libc_start_main
| |
| --4.51%-- process_request(long)
| main
| __libc_start_main
|
|--3.05%-- process_request(long)
| main
| __libc_start_main
|
--0.80%-- f2(long)
process_request(long)
main
__libc_start_main
9.44% my_test my_test [.] process_request(long)
|
--- process_request(long)
|
--9.39%-- main
__libc_start_main
1.01% my_test my_test [.] operator delete(void*)@plt
|
--- operator delete(void*)@plt
0.97% my_test my_test [.] operator new(unsigned long)@plt
|
--- operator new(unsigned long)@plt
0.20% my_test my_test [.] main
0.19% my_test [kernel.kallsyms] [k] apic_timer_interrupt
0.16% my_test [kernel.kallsyms] [k] _spin_lock
and so on ...
现在,您已经知道程序花费时间的位置。
这是测试的main.cpp文件:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
time_t f1(time_t time_value)
{
for (int j = 0; j < 10; ++j) {
++time_value;
if (j%5 == 0) {
double *p = new double;
delete p;
}
}
return time_value;
}
time_t f2(time_t time_value)
{
for (int j = 0; j < 40; ++j) {
++time_value;
}
time_value = f1(time_value);
return time_value;
}
time_t process_request(time_t time_value)
{
for (int j = 0; j < 10; ++j) {
int *p = new int;
delete p;
for (int m = 0; m < 10; ++m) {
++time_value;
}
}
for (int i = 0; i < 10; ++i) {
time_value = f1(time_value);
time_value = f2(time_value);
}
return time_value;
}
int main(int argc, char* argv2[])
{
int number_loops = argc > 1 ? atoi(argv2[1]) : 1;
time_t time_value = time(0);
printf("number loops %d\n", number_loops);
printf("time_value: %d\n", time_value);
for (int i = 0; i < number_loops; ++i) {
time_value = process_request(time_value);
}
printf("time_value: %ld\n", time_value);
return 0;
}
引用Linus Torvalds自己的话:
不要使用gprof工具。使用比较新的Linux 'perf'工具会更好。
后来他又说:
我可以几乎保证,一旦你开始使用它,你就再也不会使用gprof或oprofile了。
请参见Re: [PATCH] grep: do not do external grep on skip-worktree entries (2010-01-04)
以下是一个使用示例:
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes your_program
callgrind.out.xxx
的文件,其中xxx是程序的PID。
与Gprof不同,Valgrind适用于许多不同的语言,包括Java,在一些限制下。