Linux 中的 ptrace 是如何工作的?

40

ptrace系统调用允许父进程检查所附加的子进程。例如,在Linux中,strace(使用ptrace系统调用实现)可以检查子进程调用的系统调用。

当所附加的子进程调用系统调用时,ptracing父进程可以被通知。但是这个通知机制是如何实现的呢?我想了解这种机制背后的技术细节。

提前致谢。


http://man7.org/linux/man-pages/man2/ptrace.2.html 很有用。 - tristan
3
据我理解,OP想要了解使其发生的机制,而不仅仅是用法。 - Blagovest Buyukliev
@Blagovest Buyukliev:谢谢。那正是我想要的。我知道如何使用ptrace。我想了解它的内部原理。 - daehee
1个回答

52
附加的子进程调用系统调用时,ptrace父进程可以收到通知。但具体是如何实现的呢?
父进程使用 PTRACE_ATTACH 调用 ptrace ,而它的子进程则使用 PTRACE_TRACEME 选项调用 ptrace。这两个步骤将通过填充它们 task_struct 中的某些字段来连接两个进程(kernel/ptrace.c: sys_ptrace),子进程将在其 struct task_structptrace 字段中拥有 PT_PTRACED 标志,并且以 ptracer 进程的 pid 作为父进程的 pid,并在 ptrace_entry 列表中添加 - __ptrace_link;而父进程将在 ptraced 列表中记录子进程的 pid。
然后,strace 将使用带有 PTRACE_SYSCALL 标志的 ptrace 来注册自己作为系统调用调试器,同时在子进程的 struct thread_info 中设置线程标志 TIF_SYSCALL_TRACE(例如使用类似 set_tsk_thread_flag(child, TIF_SYSCALL_TRACE); 的方式)。关于此,请参阅 arch/x86/include/asm/thread_info.h
 67 /*
 68  * thread information flags
 69  * - these are process state flags that various assembly files
 70  *   may need to access   ...*/

 75 #define TIF_SYSCALL_TRACE       0       /* syscall trace active */
 99 #define _TIF_SYSCALL_TRACE      (1 << TIF_SYSCALL_TRACE)

每次系统调用进入或退出时,架构特定的系统调用入口代码将检查这个_TIF_SYSCALL_TRACE标志(例如,在syscall的汇编实现中,例如x86 arch/x86/kernel/entry_32.S jnz syscall_trace_entry ENTRY(system_call)中以及类似的代码在syscall_exit_work中),如果设置了,则会使用信号(SIGTRAP)通知ptracer并暂时停止子进程。这通常是在syscall_trace_entersyscall_trace_leave中完成的:

1457 long syscall_trace_enter(struct pt_regs *regs)

1483         if ((ret || test_thread_flag(TIF_SYSCALL_TRACE)) &&
1484             tracehook_report_syscall_entry(regs))
1485                 ret = -1L;

1507 void syscall_trace_leave(struct pt_regs *regs)

1531         if (step || test_thread_flag(TIF_SYSCALL_TRACE))
1532                 tracehook_report_syscall_exit(regs, step);

tracehook_report_syscall_* 是实际的工作者,它们将调用 ptrace_report_syscall。请参见include/linux/tracehook.h

 80 /**
 81  * tracehook_report_syscall_entry - task is about to attempt a system call
 82  * @regs:               user register state of current task
 83  *
 84  * This will be called if %TIF_SYSCALL_TRACE has been set, when the
 85  * current task has just entered the kernel for a system call.
 86  * Full user register state is available here.  Changing the values
 87  * in @regs can affect the system call number and arguments to be tried.
 88  * It is safe to block here, preventing the system call from beginning.
 89  *
 90  * Returns zero normally, or nonzero if the calling arch code should abort
 91  * the system call.  That must prevent normal entry so no system call is
 92  * made.  If @task ever returns to user mode after this, its register state
 93  * is unspecified, but should be something harmless like an %ENOSYS error
 94  * return.  It should preserve enough information so that syscall_rollback()
 95  * can work (see asm-generic/syscall.h).
 96  *
 97  * Called without locks, just after entering kernel mode.
 98  */
 99 static inline __must_check int tracehook_report_syscall_entry(
100         struct pt_regs *regs)
101 {
102         return ptrace_report_syscall(regs);
103 }
104 
105 /**
106  * tracehook_report_syscall_exit - task has just finished a system call
107  * @regs:               user register state of current task
108  * @step:               nonzero if simulating single-step or block-step
109  *
110  * This will be called if %TIF_SYSCALL_TRACE has been set, when the
111  * current task has just finished an attempted system call.  Full
112  * user register state is available here.  It is safe to block here,
113  * preventing signals from being processed.
114  *
115  * If @step is nonzero, this report is also in lieu of the normal
116  * trap that would follow the system call instruction because
117  * user_enable_block_step() or user_enable_single_step() was used.
118  * In this case, %TIF_SYSCALL_TRACE might not be set.
119  *
120  * Called without locks, just before checking for pending signals.
121  */
122 static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step)
123 {
...
130 
131         ptrace_report_syscall(regs);
132 }

当调试器或strace通过ptrace_notify/ptrace_do_notify时,ptrace_report_syscall会生成SIGTRAP信号

 55 /*
 56  * ptrace report for syscall entry and exit looks identical.
 57  */
 58 static inline int ptrace_report_syscall(struct pt_regs *regs)
 59 {
 60         int ptrace = current->ptrace;
 61 
 62         if (!(ptrace & PT_PTRACED))
 63                 return 0;
 64 
 65         ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
 66 
 67         /*
 68          * this isn't the same as continuing with a signal, but it will do
 69          * for normal use.  strace only continues with a signal if the
 70          * stopping signal is not SIGTRAP.  -brl
 71          */
 72         if (current->exit_code) {
 73                 send_sig(current->exit_code, current, 1);
 74                 current->exit_code = 0;
 75         }
 76 
 77         return fatal_signal_pending(current);
 78 }

ptrace_notify 实现于kernel/signal.c,它会停止子进程并将 sig_info 传递给 ptracer:

1961 static void ptrace_do_notify(int signr, int exit_code, int why)
1962 {
1963         siginfo_t info;
1964 
1965         memset(&info, 0, sizeof info);
1966         info.si_signo = signr;
1967         info.si_code = exit_code;
1968         info.si_pid = task_pid_vnr(current);
1969         info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
1970 
1971         /* Let the debugger run.  */
1972         ptrace_stop(exit_code, why, 1, &info);
1973 }
1974 
1975 void ptrace_notify(int exit_code)
1976 {
1977         BUG_ON((exit_code & (0x7f | ~0xffff)) != SIGTRAP);
1978         if (unlikely(current->task_works))
1979                 task_work_run();
1980 
1981         spin_lock_irq(&current->sighand->siglock);
1982         ptrace_do_notify(SIGTRAP, exit_code, CLD_TRAPPED);
1983         spin_unlock_irq(&current->sighand->siglock);
1984 }

ptrace_stop 位于相同的 signal.c 文件中,3.13 版本的第1839行。


5
太棒了 :) 这正是我想要的回答类型! - daehee
当前->exit_code的意义是什么?它有什么用途?我目前正在研究一个场景,在这个场景中,当ptrace附加到进程时会触发信号。另外,ptrace_stop()函数是做什么的?我看到它既设置又清除了exit_code,结果信号没有被传递。如果您想单独发布一个查询,请告诉我。真的需要您的帮助。谢谢。 - Sandeep
Ptrace_stop在这里:http://lxr.free-electrons.com/source/kernel/signal.c?v=3.13#L1828,它只是改变了“current”的状态为“TASK_TRACED”(在“ps”和“top”中表示为“T”),并向父进程/ptracer发送准备好的信号。据我所知,“struct task_struct”(包括sched.h)中的exit_code字段用于临时保存信号,以允许ptracer更改或取消信号。 - osgx
1
ptrace系统调用使用access_process_vm从其他进程中读取数据。然而,不同进程的地址空间是隔离的,这是如何实现的? - choxsword

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接