难点在于要确切知道要跟踪什么,以便只看到您想要的结果,但跟踪本身非常容易:
确定您要跟踪的事件
cat /sys/kernel/debug/tracing/available_events
将您选择的事件写入set_event
echo sys_enter_write > /sys/kernel/debug/tracing/set_event
确定要跟踪的跟踪类型
cat /sys/kernel/debug/tracing/available_tracers
在current_tracer文件中写入所需的跟踪类型
echo function_graph > /sys/kernel/debug/tracing/current_tracer
启用跟踪:
echo 1 > /sys/kernel/debug/tracing/tracing_on
现在您可以运行“iperf -c...”并在完成后禁用跟踪。
echo 0 > /sys/kernel/debug/tracing/tracing_on
要查看结果:
vi /sys/kernel/debug/tracing/trace
echo function_graph > /sys/kernel/debug/tracing/current_tracer
命令会跟踪所有函数并生成大量输出,你可能需要在这里使用 nop
命令。参见:https://dev59.com/KHzaa4cB1Zd3GeqPTLK3#52776478 - Ciro Santilli OurBigBook.comstrace
如果你只想跟踪一个进程,为什么不使用strace
呢?它甚至可以连接到正在运行的进程:如何让strace连接到已经运行的进程?
对于ftrace,请使用echo none > /sys/kernel/debug/tracing/current_tracer
来跟踪系统调用
至少在Linux 4.15中,如果您使用function_graph
,则除了系统调用外,还会显示大量函数。
您可以通过过滤解决这个问题,但使用nop
更简单:如何在不显示Linux内核中的其他任何函数的情况下使用ftrace跟踪系统调用事件?
以sudo
身份运行:
#!/bin/sh
set -eux
d=debug/tracing
mkdir -p debug
if ! mountpoint -q debug; then
mount -t debugfs nodev debug
fi
# Stop tracing.
echo 0 > "${d}/tracing_on"
# Clear previous traces.
echo > "${d}/trace"
# Find the tracer name.
cat "${d}/available_tracers"
# Disable tracing functions, show only system call events.
echo nop > "${d}/current_tracer"
# Find the event name with.
grep write "${d}/available_events"
# Enable tracing write.
# Both statements below seem to do the exact same thing,
# just with different interfaces.
# https://www.kernel.org/doc/html/v4.18/trace/events.html
echo sys_enter_write > "${d}/set_event"
# echo 1 > "${d}/events/syscalls/sys_enter_write/enable"
# Start tracing.
echo 1 > "${d}/tracing_on"
# Generate two write calls by two different processes.
rm -rf /tmp/a /tmp/b
printf a > /tmp/a
printf b > /tmp/b
# View the trace.
cat "${d}/trace"
# Stop tracing.
echo 0 > "${d}/tracing_on"
umount debug
和一个样本输出:
# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
a.sh-18135 [004] .... 11152.454767: sys_write(fd: 2, buf: 5558769acc00, count: 2)
a.sh-18135 [004] .... 11152.454777: sys_write(fd: 2, buf: 555877f6a968, count: 2)
a.sh-18135 [004] .... 11152.454785: sys_write(fd: 2, buf: 555877f6a968, count: 4)
a.sh-18135 [004] .... 11152.454793: sys_write(fd: 2, buf: 555877f6a968, count: 7)
a.sh-18135 [004] .... 11152.454801: sys_write(fd: 2, buf: 555877f6a968, count: 7)
a.sh-18135 [004] .... 11152.454807: sys_write(fd: 2, buf: 7ffc5f3f2be7, count: 1)
gnome-terminal--3419 [005] .... 11152.454833: sys_write(fd: 4, buf: 7ffe4f3a61c0, count: 8)
gnome-terminal--3419 [005] .... 11152.454862: sys_write(fd: 4, buf: 7ffe4f3a61c0, count: 8)
gnome-terminal--3419 [005] .... 11152.454887: sys_write(fd: 4, buf: 7ffe4f3a61c0, count: 8)
gnome-terminal--3419 [005] .... 11152.454894: sys_write(fd: 4, buf: 7ffe4f3a61c0, count: 8)
gmain-3193 [002] .... 11152.456141: sys_write(fd: 4, buf: 7fbfe5a93c40, count: 8)
gmain-3193 [002] .... 11152.456168: sys_write(fd: 4, buf: 7fbfe5a92bd0, count: 8)
gmain-3193 [002] .... 11152.456172: sys_write(fd: 4, buf: 7fbfe5a93c40, count: 8)
a.sh-18135 [004] .... 11152.456534: sys_write(fd: 2, buf: 5558769acbd8, count: 2)
a.sh-18135 [004] .... 11152.456547: sys_write(fd: 2, buf: 555877f6a968, count: 6)
a.sh-18135 [004] .... 11152.456555: sys_write(fd: 2, buf: 555877f6a968, count: 2)
a.sh-18135 [004] .... 11152.456561: sys_write(fd: 2, buf: 7ffc5f3f2be7, count: 1)
a.sh-18135 [004] .... 11152.456578: sys_write(fd: 1, buf: 555877f6af00, count: 1)
gnome-terminal--3419 [005] .... 11152.456651: sys_write(fd: 4, buf: 7ffe4f3a61c0, count: 8)
gnome-terminal--3419 [005] .... 11152.456660: sys_write(fd: 4, buf: 7ffe4f3a61c0, count: 8)
a.sh-18135 [004] .... 11152.456674: sys_write(fd: 2, buf: 5558769acbd8, count: 2)
a.sh-18135 [004] .... 11152.456683: sys_write(fd: 2, buf: 555877f6a968, count: 6)
a.sh-18135 [004] .... 11152.456690: sys_write(fd: 2, buf: 555877f6a968, count: 2)
a.sh-18135 [004] .... 11152.456694: sys_write(fd: 2, buf: 7ffc5f3f2be7, count: 1)
a.sh-18135 [004] .... 11152.456703: sys_write(fd: 1, buf: 555877f6af00, count: 1)
a.sh-18135 [004] .... 11152.456738: sys_write(fd: 2, buf: 5558769acb88, count: 2)
a.sh-18135 [004] .... 11152.456745: sys_write(fd: 2, buf: 555877f6a968, count: 3)
a.sh-18135 [004] .... 11152.456750: sys_write(fd: 2, buf: 555877f6a968, count: 14)
a.sh-18135 [004] .... 11152.456756: sys_write(fd: 2, buf: 7ffc5f3f2be7, count: 1)
gnome-terminal--3419 [005] .... 11152.456816: sys_write(fd: 4, buf: 7ffe4f3a61c0, count: 8)
gnome-terminal--3419 [005] .... 11152.456845: sys_write(fd: 4, buf: 7ffe4f3a61c0, count: 8)
正如您所看到的,write
是一个非常常见的系统调用,因此很难隔离我们的两个写入调用。如果使用一个不太常见的调用,例如 mkdir
,情况会更容易理解: 我如何在Linux中跟踪系统调用?
在Ubuntu 18.04、Linux内核4.15上进行测试。
要获取write系统调用的调用图,您可以执行以下操作:
sudo trace-cmd record -p function_graph -g vfs_write your-prog
你的程序可以是简单的编写程序,例如:
#include <stdio.h>
#include <sys/file.h>
int main()
{
int fd = open("a.txt", O_WRONLY | O_CREAT);
write(fd, "abcdefg", 6);
close(fd);
}
trace-cmd report | grep -v trace-cmd | less
sudo perf record --call-graph fp ./fio --filename=aa --size=4G --direct=0 --rw=write --bs=1M --iodepth=512 --numjobs=1 --name=job_name
然后:
sudo perf report --call-graph
并扩展vfs_write:
Children Self Command Shared Object Symbol
+ 80.17% 0.00% fio [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe ◆
+ 80.15% 0.02% fio [kernel.kallsyms] [k] do_syscall_64 ▒
+ 78.57% 0.07% fio libpthread-2.31.so [.] __libc_pwrite64 ▒
- 78.49% 0.04% fio [kernel.kallsyms] [k] vfs_write ▒
- 78.46% vfs_write ▒
- 78.40% new_sync_write ▒
- 78.37% ext4_file_write_iter ▒
- 78.33% ext4_buffered_write_iter ▒
- 77.73% generic_perform_write ▒
- 50.15% ext4_da_write_begin ▒
- 38.72% grab_cache_page_write_begin ▒
- 38.61% pagecache_get_page ▒
- 23.52% __page_cache_alloc ▒
- 23.16% alloc_pages ▒
- 22.63% __alloc_pages ▒
- 22.00% get_page_from_freelist ▒
- 18.16% prep_new_page ▒
17.84% clear_page_rep ▒
1.80% rmqueue ▒
1.28% node_dirty_ok ▒
- 12.84% add_to_page_cache_lru ▒
- 9.77% __add_to_page_cache_locked ▒
- 1.89% __mem_cgroup_charge ▒
1.01% charge_memcg ▒
0.77% get_mem_cgroup_from_mm ▒
- 1.09% xa_get_order ▒
- 0.89% xas_load ▒
0.53% xas_descend ▒
- 2.69% lru_cache_add ▒
- 2.37% __pagevec_lru_add ▒
2.01% __lock_text_start ▒
- 1.21% xas_load ▒
0.66% xas_descend ▒
- 10.38% ext4_block_write_begin ▒
- 7.01% create_empty_buffers ▒
- 5.98% alloc_page_buffers ▒
- 5.59% alloc_buffer_head ▒
- 4.24% kmem_cache_alloc ▒
- 1.03% ___slab_alloc ▒
- new_slab ▒
- 0.87% allocate_slab ▒
- 0.57% alloc_pages ▒
- 0.55% __alloc_pages ▒
get_page_from_freelist ▒
0.84% get_obj_cgroup_from_current ▒
0.53% memset ▒
- 2.42% ext4_da_get_block_prep ▒
- 1.26% ext4_da_map_blocks.constprop.0 ▒
0.53% ext4_es_lookup_extent ▒
- 16.27% copy_page_from_iter_atomic ▒
copy_user_generic_string ▒
- 8.74% ext4_da_write_end ▒
- 8.23% generic_write_end ▒
- 7.83% block_write_end ▒
- __block_commit_write.isra.0 ▒
- 6.89% mark_buffer_dirty ▒
- 5.66% __set_page_dirty ▒
5.54% __lock_text_start ▒
1.35% __get_user_nocheck_1 ▒
+ 78.48% 0.00% fio [kernel.kallsyms] [k] __x64_sys_pwrite64 ▒
+ 78.48% 0.00% fio [kernel.kallsyms] [k] ksys_pwrite64 ▒
+ 78.40% 0.02% fio [kernel.kallsyms] [k] new_sync_write ▒
+ 78.37% 0.02% fio [kernel.kallsyms] [k] ext4_file_write_iter ▒
+ 78.33% 0.00% fio [kernel.kallsyms] [k] ext4_buffered_write_iter ▒
+ 77.73% 0.27% fio [kernel.kallsyms] [k] generic_perform_write ▒
+ 50.28% 0.45% fio [kernel.kallsyms] [k] ext4_da_write_begin ▒
+ 38.76% 0.07% fio [kernel.kallsyms] [k] grab_cache_page_write_begin ▒
+ 38.74% 0.91% fio [kernel.kallsyms] [k] pagecache_get_page ▒
+ 25.19% 0.66% fio [kernel.kallsyms] [k] __alloc_pages ▒
+ 24.44% 0.64% fio [kernel.kallsyms] [k] get_page_from_freelist