如何在Linux中跟踪每个文件的IO操作?

15

我需要追踪特定文件的read系统调用,目前通过解析strace的输出来实现此功能。由于read是在文件描述符上进行操作,因此我必须跟踪fdpath之间的当前映射关系。此外,还必须监视seek以保持跟踪中的当前位置更新。

在Linux中,有没有更好的方法来获取每个应用程序、每个文件路径的IO跟踪记录?

7个回答

12

您可以等待文件被打开,这样您就可以了解 fd 并在进程启动后像这样附加 strace:

strace -p pid -e trace=file -e read=fd


(注:此处的“fd”指文件描述符,其中“pid”表示进程 ID)

如果你有一个多线程程序,这可能会很困难;例如,Java中的大多数内容。 - Dan Pritts
要学习与文件路径对应的 fd,请查看 /proc/$PID/fd/ - Ruslan
-y选项会打印与文件描述符参数相关联的路径。 - sparrowt

6

Systemtap是Linux下类似DTrace的重新实现,可能对这里有帮助。

与strace一样,您只有文件描述符(fd),但是通过脚本能力很容易维护fd的文件名(除非使用像dup之类的有趣东西)。 有一个示例脚本iotime可以 说明 它。

#! /usr/bin/env stap

/*
 * Copyright (C) 2006-2007 Red Hat Inc.
 * 
 * This copyrighted material is made available to anyone wishing to use,
 * modify, copy, or redistribute it subject to the terms and conditions
 * of the GNU General Public License v.2.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 *
 * Print out the amount of time spent in the read and write systemcall
 * when each file opened by the process is closed. Note that the systemtap 
 * script needs to be running before the open operations occur for
 * the script to record data.
 *
 * This script could be used to to find out which files are slow to load
 * on a machine. e.g.
 *
 * stap iotime.stp -c 'firefox'
 *
 * Output format is:
 * timestamp pid (executabable) info_type path ...
 *
 * 200283135 2573 (cupsd) access /etc/printcap read: 0 write: 7063
 * 200283143 2573 (cupsd) iotime /etc/printcap time: 69
 *
 */

global start
global time_io

function timestamp:long() { return gettimeofday_us() - start }

function proc:string() { return sprintf("%d (%s)", pid(), execname()) }

probe begin { start = gettimeofday_us() }

global filehandles, fileread, filewrite

probe syscall.open.return {
  filename = user_string($filename)
  if ($return != -1) {
    filehandles[pid(), $return] = filename
  } else {
    printf("%d %s access %s fail\n", timestamp(), proc(), filename)
  }
}

probe syscall.read.return {
  p = pid()
  fd = $fd
  bytes = $return
  time = gettimeofday_us() - @entry(gettimeofday_us())
  if (bytes > 0)
    fileread[p, fd] += bytes
  time_io[p, fd] <<< time
}

probe syscall.write.return {
  p = pid()
  fd = $fd
  bytes = $return
  time = gettimeofday_us() - @entry(gettimeofday_us())
  if (bytes > 0)
    filewrite[p, fd] += bytes
  time_io[p, fd] <<< time
}

probe syscall.close {
  if ([pid(), $fd] in filehandles) {
    printf("%d %s access %s read: %d write: %d\n",
           timestamp(), proc(), filehandles[pid(), $fd],
           fileread[pid(), $fd], filewrite[pid(), $fd])
    if (@count(time_io[pid(), $fd]))
      printf("%d %s iotime %s time: %d\n",  timestamp(), proc(),
             filehandles[pid(), $fd], @sum(time_io[pid(), $fd]))
   }
  delete fileread[pid(), $fd]
  delete filewrite[pid(), $fd]
  delete filehandles[pid(), $fd]
  delete time_io[pid(),$fd]
}

由于哈希表的大小限制,它只适用于一定数量的文件。


6

首先,你可能不需要跟踪因为在/proc/PID/fd/中可用的fdpath之间的映射。

其次,也许你应该使用LD_PRELOAD技巧,并在C中重载openseekread系统调用。有一些关于如何重载malloc/free的文章herethere

我猜想对于这些系统调用应用相同的技巧不会有太大的区别。它需要在C中实现,但应该比解析strace输出需要更少的代码并且更准确。


同意。我现在正在使用LD_PRELOAD作为替代方案,但是希望有一些开箱即用的解决方案。谢谢。 - Noah Watkins

2

strace现在有新选项来跟踪文件描述符:

--decode-fds=set
                   Decode various information associated with file descriptors.  The default is decode-fds=none.  set can include the following elements:

                   path    Print file paths.
                   socket  Print socket protocol-specific information,
                   dev     Print character/block device numbers.
                   pidfd   Print PIDs associated with pidfd file descriptors.

这很有用,因为文件描述符在关闭后会被重复使用,而/proc/$PID/fd只提供一次快照,在实时调试时无用。

示例输出,请注意文件名显示在尖括号中,FD 3被用于所有的/etc/ld.so.cache/lib/x86_64-linux-gnu/libc.so.6/usr/lib/locale/locale-archive/home/florian/hello

$ strace -e trace=desc --decode-fds=all cat hello 1>/dev/null
execve("/usr/bin/cat", ["cat", "hello"], 0x7fff42e20710 /* 102 vars */) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3</etc/ld.so.cache>
newfstatat(3</etc/ld.so.cache>, "", {st_mode=S_IFREG|0644, st_size=167234, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 167234, PROT_READ, MAP_PRIVATE, 3</etc/ld.so.cache>, 0) = 0x7f22edeee000
close(3</etc/ld.so.cache>)              = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3</usr/lib/x86_64-linux-gnu/libc-2.33.so>
read(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\206\2\0\0\0\0\0"..., 832) = 832
pread64(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "\4\0\0\0 \0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0"..., 48, 848) = 48
pread64(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0+H)\227\201T\214\233\304R\352\306\3379\220%"..., 68, 896) = 68
newfstatat(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "", {st_mode=S_IFREG|0755, st_size=1983576, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f22edeec000
pread64(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 2012056, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, 0) = 0x7f22edd00000
mmap(0x7f22edd26000, 1486848, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, 0x26000) = 0x7f22edd26000
mmap(0x7f22ede91000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, 0x191000) = 0x7f22ede91000
mmap(0x7f22ededd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/x86_64-linux-gnu/libc-2.33.so>, 0x1dc000) = 0x7f22ededd000
mmap(0x7f22edee3000, 33688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f22edee3000
close(3</usr/lib/x86_64-linux-gnu/libc-2.33.so>) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f22edcfe000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3</usr/lib/locale/locale-archive>
newfstatat(3</usr/lib/locale/locale-archive>, "", {st_mode=S_IFREG|0644, st_size=6055600, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 6055600, PROT_READ, MAP_PRIVATE, 3</usr/lib/locale/locale-archive>, 0) = 0x7f22ed737000
close(3</usr/lib/locale/locale-archive>) = 0
fstat(1</dev/null<char 1:3>>, {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}) = 0
openat(AT_FDCWD, "hello", O_RDONLY)     = 3</home/florian/hello>
fstat(3</home/florian/hello>, {st_mode=S_IFREG|0664, st_size=6, ...}) = 0
fadvise64(3</home/florian/hello>, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f22edef5000
read(3</home/florian/hello>, "world\n", 131072) = 6
write(1</dev/null<char 1:3>>, "world\n", 6) = 6
read(3</home/florian/hello>, "", 131072) = 0
close(3</home/florian/hello>)           = 0
close(1</dev/null<char 1:3>>)           = 0
close(2</dev/pts/5<char 136:5>>)        = 0
+++ exited with 0 +++

1

我认为重载openseekread是一个不错的解决方案。但是,如果你想以编程方式解析和分析strace输出,我之前也做过类似的事情,并将我的代码放在了github上:https://github.com/johnlcf/Stana/wiki

(我这样做是因为我需要分析其他人运行的程序的strace结果,而要求他们使用LD_PRELOAD并不容易。)


0

可能做到这一点最好的方法是使用 fanotify。Fanotify 是一个 Linux 内核设施,可以便宜地监视文件系统事件。我不确定它是否允许按 PID 进行过滤,但它确实将 PID 传递给您的程序,因此您可以检查它是否是您感兴趣的那个。

这里有一个很好的代码示例: http://bazaar.launchpad.net/~pitti/fatrace/trunk/view/head:/fatrace.c

然而,目前似乎文档不够详尽。我能找到的所有文档都在 http://www.spinics.net/lists/linux-man/msg02302.htmlhttp://lkml.indiana.edu/hypermail/linux/kernel/0811.1/01668.html


-1
解析命令行工具(如strace)很麻烦;您可以使用ptrace()系统调用代替。有关详细信息,请参见man ptrace

因此,解析使用ptrace的实用程序很麻烦,但从头开始编写另一个实用程序就不麻烦了吗?.. - Ruslan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接