寻找重定位的起源

3
使用Ulrich Drepper的relinfo.pl脚本,可以轻松计算DSO的重定位次数,但它不能用于.o文件。
假设我有一个大型共享库,对其重定位次数不满意。是否有一种方法可以找出它们来自哪里(符号或至少是.o),以检查它们是否属于易于修复的类型(例如:const char * str = "Hello World";' -> const char str[] = "Hello World";)?

你想要知道什么:需要该符号的目标文件?还是包含该符号的目标文件或共享库?你有很多目标文件还是只有一个共享库? - Martin Rosenau
@MartinRosenau:包含该符号的目标文件。我有共享库和链接它的.o文件(我也有源代码,但是仅使用static const char \* .*[]进行grep只能帮我找到这么多...)。 - Marc Mutz - mmutz
2个回答

13
简短回答:使用objdumpreadelf代替。
详细回答:让我们看一个实际的例子,example.c
#include <stdio.h>

static const char global1[] = "static const char []";
static const char *global2 = "static const char *";
static const char *const global3 = "static const char *const";
const char global4[] = "const char []";
const char *global5 = "const char *";
const char *const global6 = "const char *const";
char global7[] = "char []";
char *global8 = "char *";
char *const global9 = "char *const";

int main(void)
{
    static const char local1[] = "static const char []";
    static const char *local2 = "static const char *";
    static const char *const local3 = "static const char *const";
    const char local4[] = "const char []";
    const char *local5 = "const char *";
    const char *const local6 = "const char *const";
    char local7[] = "char []";
    char *local8 = "char *";
    char *const local9 = "char *const";

    printf("Global:\n");
    printf("\t%s\n", global1);
    printf("\t%s\n", global2);
    printf("\t%s\n", global3);
    printf("\t%s\n", global4);
    printf("\t%s\n", global5);
    printf("\t%s\n", global6);
    printf("\t%s\n", global7);
    printf("\t%s\n", global8);
    printf("\t%s\n", global9);
    printf("\n");
    printf("Local:\n");
    printf("\t%s\n", local1);
    printf("\t%s\n", local2);
    printf("\t%s\n", local3);
    printf("\t%s\n", local4);
    printf("\t%s\n", local5);
    printf("\t%s\n", local6);
    printf("\t%s\n", local7);
    printf("\t%s\n", local8);
    printf("\t%s\n", local9);

    return 0;
}

您可以使用例如编译它为对象文件。
gcc -W -Wall -c example.c

并将其转换为可执行文件

gcc -W -Wall example.c -o example

你可以使用objdump -tr example.o命令来转储(非动态)目标文件的符号和重定位信息,或者使用objdump -TtRr example命令来转储可执行文件(以及动态目标文件)的相同信息。
objdump -t example.o

在 x86-64 上,我得到
example.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 example.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .rodata    0000000000000000 .rodata
0000000000000000 l     O .rodata    0000000000000015 global1
0000000000000000 l     O .data  0000000000000008 global2
0000000000000048 l     O .rodata    0000000000000008 global3
00000000000000c0 l     O .rodata    0000000000000015 local1.2053
0000000000000020 l     O .data  0000000000000008 local2.2054
00000000000000d8 l     O .rodata    0000000000000008 local3.2055
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame
0000000000000000 l    d  .comment   0000000000000000 .comment
0000000000000050 g     O .rodata    000000000000000e global4
0000000000000008 g     O .data  0000000000000008 global5
0000000000000080 g     O .rodata    0000000000000008 global6
0000000000000010 g     O .data  0000000000000008 global7
0000000000000018 g     O .data  0000000000000008 global8
00000000000000a0 g     O .rodata    0000000000000008 global9
0000000000000000 g     F .text  000000000000027a main
0000000000000000         *UND*  0000000000000000 puts
0000000000000000         *UND*  0000000000000000 printf
0000000000000000         *UND*  0000000000000000 putchar
0000000000000000         *UND*  0000000000000000 __stack_chk_fail

输出结果在 man 1 objdump-t 标题下进行说明。请注意,第二个“列”实际上是固定宽度的:七个字符宽,描述对象的类型。第三列是节名称,对于未定义的使用 *UND*,对于代码使用 .text, 对于只读(不可变)数据使用 .rodata,对于已初始化的可变数据使用 .data,对于未初始化的可变数据使用 .bss 等等。
从上面的符号表中可以看出,local4local5local6local7local8local9 变量实际上根本没有在符号表中获得条目。这是因为它们是局部变量 main()。它们所引用的字符串内容存储在 .data.rodata 中(或根据编译器的最佳选择动态构造)。
接下来让我们看一下重定位记录。使用
objdump -r example.o

我得到
example.o:     file format elf64-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000037 R_X86_64_32S      .rodata+0x000000000000005e
0000000000000040 R_X86_64_32S      .rodata+0x000000000000006b
0000000000000059 R_X86_64_32S      .rodata+0x0000000000000088
0000000000000062 R_X86_64_32S      .rodata+0x000000000000008f
0000000000000067 R_X86_64_32       .rodata+0x00000000000000a8
000000000000006c R_X86_64_PC32     puts-0x0000000000000004
0000000000000071 R_X86_64_32       .rodata+0x00000000000000b0
0000000000000076 R_X86_64_32       .rodata
0000000000000083 R_X86_64_PC32     printf-0x0000000000000004
000000000000008a R_X86_64_PC32     .data-0x0000000000000004
000000000000008f R_X86_64_32       .rodata+0x00000000000000b0
000000000000009f R_X86_64_PC32     printf-0x0000000000000004
00000000000000a6 R_X86_64_PC32     .rodata+0x0000000000000044
00000000000000ab R_X86_64_32       .rodata+0x00000000000000b0
00000000000000bb R_X86_64_PC32     printf-0x0000000000000004
00000000000000c0 R_X86_64_32       .rodata+0x00000000000000b0
00000000000000c5 R_X86_64_32       global4
00000000000000d2 R_X86_64_PC32     printf-0x0000000000000004
00000000000000d9 R_X86_64_PC32     global5-0x0000000000000004
00000000000000de R_X86_64_32       .rodata+0x00000000000000b0
00000000000000ee R_X86_64_PC32     printf-0x0000000000000004
00000000000000f5 R_X86_64_PC32     global6-0x0000000000000004
00000000000000fa R_X86_64_32       .rodata+0x00000000000000b0
000000000000010a R_X86_64_PC32     printf-0x0000000000000004
000000000000010f R_X86_64_32       .rodata+0x00000000000000b0
0000000000000114 R_X86_64_32       global7
0000000000000121 R_X86_64_PC32     printf-0x0000000000000004
0000000000000128 R_X86_64_PC32     global8-0x0000000000000004
000000000000012d R_X86_64_32       .rodata+0x00000000000000b0
000000000000013d R_X86_64_PC32     printf-0x0000000000000004
0000000000000144 R_X86_64_PC32     global9-0x0000000000000004
0000000000000149 R_X86_64_32       .rodata+0x00000000000000b0
0000000000000159 R_X86_64_PC32     printf-0x0000000000000004
0000000000000163 R_X86_64_PC32     putchar-0x0000000000000004
0000000000000168 R_X86_64_32       .rodata+0x00000000000000b5
000000000000016d R_X86_64_PC32     puts-0x0000000000000004
0000000000000172 R_X86_64_32       .rodata+0x00000000000000b0
0000000000000177 R_X86_64_32       .rodata+0x00000000000000c0
0000000000000184 R_X86_64_PC32     printf-0x0000000000000004
000000000000018b R_X86_64_PC32     .data+0x000000000000001c
0000000000000190 R_X86_64_32       .rodata+0x00000000000000b0
00000000000001a0 R_X86_64_PC32     printf-0x0000000000000004
00000000000001a7 R_X86_64_PC32     .rodata+0x00000000000000d4
00000000000001ac R_X86_64_32       .rodata+0x00000000000000b0
00000000000001bc R_X86_64_PC32     printf-0x0000000000000004
00000000000001c1 R_X86_64_32       .rodata+0x00000000000000b0
00000000000001d6 R_X86_64_PC32     printf-0x0000000000000004
00000000000001db R_X86_64_32       .rodata+0x00000000000000b0
00000000000001ef R_X86_64_PC32     printf-0x0000000000000004
00000000000001f4 R_X86_64_32       .rodata+0x00000000000000b0
0000000000000209 R_X86_64_PC32     printf-0x0000000000000004
000000000000020e R_X86_64_32       .rodata+0x00000000000000b0
0000000000000223 R_X86_64_PC32     printf-0x0000000000000004
0000000000000228 R_X86_64_32       .rodata+0x00000000000000b0
000000000000023d R_X86_64_PC32     printf-0x0000000000000004
0000000000000242 R_X86_64_32       .rodata+0x00000000000000b0
0000000000000257 R_X86_64_PC32     printf-0x0000000000000004
0000000000000271 R_X86_64_PC32     __stack_chk_fail-0x0000000000000004


RELOCATION RECORDS FOR [.data]:
OFFSET           TYPE              VALUE 
0000000000000000 R_X86_64_64       .rodata+0x0000000000000015
0000000000000008 R_X86_64_64       .rodata+0x000000000000005e
0000000000000018 R_X86_64_64       .rodata+0x0000000000000088
0000000000000020 R_X86_64_64       .rodata+0x0000000000000015


RELOCATION RECORDS FOR [.rodata]:
OFFSET           TYPE              VALUE 
0000000000000048 R_X86_64_64       .rodata+0x0000000000000029
0000000000000080 R_X86_64_64       .rodata+0x000000000000006b
00000000000000a0 R_X86_64_64       .rodata+0x000000000000008f
00000000000000d8 R_X86_64_64       .rodata+0x0000000000000029


RELOCATION RECORDS FOR [.eh_frame]:
OFFSET           TYPE              VALUE 
0000000000000020 R_X86_64_PC32     .text

重定位记录按其所在的重定位部分进行分组。由于字符串内容位于 .data.rodata 部分,因此我们可以限制自己只查看以 .data.rodata 开头的 VALUE 的重定位。(可变字符串,如 char global7[] = "char []"; 存储在 .data 中,不可变字符串和字符串字面量存储在 .rodata 中。)
如果我们启用调试符号编译代码,确定哪个变量用于引用哪个字符串将更容易,但我可能只查看每个重定位值(目标)的实际内容,以查看需要修复哪些对不可变字符串的引用。
命令组合:
objdump -r example.o | awk '($3 ~ /^\..*\+/) { t = $3; sub(/\+/, " ", t); n[t]++ } END { for (r in n) printf "%d %s\n", n[r], r }' | sort -g

将输出每个目标的重定位次数,接着是目标段落,然后是段落中的目标偏移量,按照在重定位中出现最多的目标排序,即上面输出的最后几行是您需要关注的内容。对我而言,我得到了

1 .rodata
1 .rodata 0x0000000000000044
1 .rodata 0x00000000000000a8
1 .rodata 0x00000000000000b5
1 .rodata 0x00000000000000c0
1 .rodata 0x00000000000000d4
2 .rodata 0x0000000000000015
2 .rodata 0x0000000000000029
2 .rodata 0x000000000000005e
2 .rodata 0x000000000000006b
2 .rodata 0x0000000000000088
2 .rodata 0x000000000000008f
18 .rodata 0x00000000000000b0

如果我添加优化(gcc -W -Wall -O3 -fomit-frame-pointer -c example.c),结果是
1 .rodata 0x0000000000000020
1 .rodata 0x0000000000000040
1 .rodata.str1.1
1 .rodata.str1.1 0x0000000000000058
2 .rodata.str1.1 0x000000000000000d
2 .rodata.str1.1 0x0000000000000021
2 .rodata.str1.1 0x000000000000005f
2 .rodata.str1.1 0x000000000000006c
3 .rodata.str1.1 0x000000000000003a
3 .rodata.str1.1 0x000000000000004c
18 .rodata.str1.1 0x0000000000000008

这段文字表明编译器选项确实会产生很大的影响,但是有一个目标始终会被使用18次:在偏移量为0xb0(如果启用了编译时优化,则为.rodata.str1.1,在偏移量为0x8)的.rodata部分。
这就是`"\t%s\n"字符串字面量。
将原始程序修改为:
    char *local8 = "char *";
    char *const local9 = "char *const";

    const char *const fmt = "\t%s\n";

    printf("Global:\n");
    printf(fmt, global1);
    printf(fmt, global2);

使用一个不可变的字符串指针fmt替换格式字符串可以完全消除那18个重定位(当然你也可以使用等效的const char fmt[] = "\t%s\n";)。以上分析表明,在GCC-4.6.3下,大部分可避免的重定位都是由于(重复使用)字符串字面值引起的。将它们替换为一个const char数组(const char fmt[] = "\t%s\n";)或一个指向const char的const指针(const char *const fmt = "\t%s\n";)--这两种情况都将内容放到.rodata部分,只读,指针/数组本身的引用也是不可变的--对我来说似乎是一种有效和安全的策略。此外,将字符串字面值转换为不可变的字符串指针或char数组完全是一个源代码级任务。也就是说,如果你使用上述方法转换所有字符串字面值,你至少可以消除每个字符串字面值的一个重定位。实际上,我不认为目标级别的分析会对你有多大帮助。当然,它会告诉你你的修改是否减少了所需的重定位数量。
上述awk段落可扩展为一个函数,输出动态引用的字符串常量,并带有正偏移量。
#!/bin/bash
if [ $# -ne 1 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
    exec >&2
    echo ""
    echo "Usage: %s [ -h | --help ]"
    echo "       %s object.o"
    echo ""
    exit 1
fi

export LANG=C LC_ALL=C

objdump -wr "$1" | awk '
    BEGIN {
        RS = "[\t\v\f ]*[\r\n][\t\n\v\f\r ]*"
        FS = "[\t\v\f ]+"
    }

    $1 ~ /^[0-9A-Fa-f]+/ {
        n[$3]++
    }

    END {
        for (s in n)
            printf "%d %s\n", n[s], s
    }
' | sort -g | gawk -v filename="$1" '
    BEGIN {
        RS = "[\t\v\f ]*[\r\n][\t\n\v\f\r ]*"
        FS = "[\t\v\f ]+"

        cmd = "objdump --file-offsets -ws " filename
        while ((cmd | getline) > 0)
            if ($3 == "section") {
                s = $4
                sub(/:$/, "", s)
                o = $NF
                sub(/\)$/, "", o)
                start[s] = strtonum(o)
            }
        close(cmd)
    }

    {
        if ($2 ~ /\..*\+/) {
            s = $2
            o = $2
            sub(/\+.*$/, "", s)
            sub(/^[^\+]*\+/, "", o)
            o = strtonum(o) + start[s]
            cmd = "dd if=\"" filename "\" of=/dev/stdout bs=1 skip=" o " count=256"
            OLDRS = RS
            RS = "\0"
            cmd | getline hex
            close(cmd)
            RS = OLDRS
            gsub(/\\/, "\\\\", hex)
            gsub(/\t/, "\\t", hex)
            gsub(/\n/, "\\n", hex)
            gsub(/\r/, "\\r", hex)
            gsub(/\"/, "\\\"", hex)
            if (hex ~ /[\x00-\x1F\x7F-\x9F\xFE\xFF]/ || length(hex) < 1)
                printf "%s\n", $0
            else
                printf "%s = \"%s\"\n", $0, hex
        } else
            print $0
    }
'

这段代码有些粗糙,只是随便拼凑而成的,因此我不知道它的可移植性如何。在我的机器上,它似乎能够找到我测试过的一些字符串字面量;你可能需要重写它以适应自己的需求。或者直接使用支持ELF的编程语言来直接检查目标文件。对于上面展示的示例程序(在我建议减少重定位数量之前),在没有优化的情况下编译,上述脚本会生成输出结果。
1 .data+0x000000000000001c = ""
1 .data-0x0000000000000004
1 .rodata
1 .rodata+0x0000000000000044 = ""
1 .rodata+0x00000000000000a8 = "Global:"
1 .rodata+0x00000000000000b5 = "Local:"
1 .rodata+0x00000000000000c0 = "static const char []"
1 .rodata+0x00000000000000d4 = ""
1 .text
1 __stack_chk_fail-0x0000000000000004
1 format
1 global4
1 global5-0x0000000000000004
1 global6-0x0000000000000004
1 global7
1 global8-0x0000000000000004
1 global9-0x0000000000000004
1 putchar-0x0000000000000004
2 .rodata+0x0000000000000015 = "static const char *"
2 .rodata+0x0000000000000029 = "static const char *const"
2 .rodata+0x000000000000005e = "const char *"
2 .rodata+0x000000000000006b = "const char *const"
2 .rodata+0x0000000000000088 = "char *"
2 .rodata+0x000000000000008f = "char *const"
2 puts-0x0000000000000004
18 .rodata+0x00000000000000b0 = "\t%s\n"
18 printf-0x0000000000000004

最后,您可能会注意到,使用函数指针来代替直接调用printf()将从示例代码中减少18个重定位,但我认为这是一个错误。
对于代码而言,您需要重定位,因为间接函数调用(通过函数指针的调用)比直接调用慢得多。简单地说,这些重定位使函数和子程序调用更快,因此您绝对需要保留它们。
对于答案过长,我深表歉意;希望您会发现这很有用。还有问题吗?

2

根据Nomainal Animals的回答,我还需要完全理解,但是我已经想出了以下简单的shell脚本,似乎可以找到我所说的“容易修复”的问题:

for i in path/to/*.o ; do
    REL="$(objdump -TtRr "$i" 2>/dev/null | grep '.data.rel.ro.local[^]+-]')"
    if [ -n "$REL" ]; then
        echo "$(basename "$i"):"
        echo "$REL" | c++filt
        echo
    fi
done

QtGui库的示例输出:

qimagereader.o:
0000000000000000 l     O .data.rel.ro.local     00000000000000c0 _qt_BuiltInFormats
0000000000000000 l    d  .data.rel.ro.local     0000000000000000 .data.rel.ro.local

qopenglengineshadermanager.o:
0000000000000000 l     O .data.rel.ro.local     0000000000000090 QOpenGLEngineShaderManager::getUniformLocation(QOpenGLEngineShaderManager::Uniform)::uniformNames
0000000000000000 l    d  .data.rel.ro.local     0000000000000000 .data.rel.ro.local

qopenglpaintengine.o:
0000000000000000 l     O .data.rel.ro.local     0000000000000020 vtable for (anonymous namespace)::QOpenGLStaticTextUserData
0000000000000000 l    d  .data.rel.ro.local     0000000000000000 .data.rel.ro.local

qtexthtmlparser.o:
0000000000000000 l     O .data.rel.ro.local     00000000000003b0 elements
0000000000000000 l    d  .data.rel.ro.local     0000000000000000 .data.rel.ro.local

查找源文件中的这些符号通常很快就能找到解决方法,否则就会发现它们不容易修复。

但是我想一旦我没有可以修复的.data.rel.ro.local,我就必须重新访问Nominal Animal的答案...


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接