我在思考函数式编程技术、递归、常量性和链表,所以我进行了两个实验来测试我的编译器。理论上,编译器应该能够优化掉尾递归(我现在意识到这里提供的函数并不是尾递归,并为试图将它们偷偷摸摸地传递得像尾递归一样而道歉,实际的尾递归变体可以在我下面发布的答案中找到),并在编译时产生结果,甚至不需要在运行时首先构建结构。
如下所示的数组变体按预期工作:
/**
* @file test1.c
*/
static inline int my_array_sum(const int n, const int * const xs) {
if (n == 0) {
return n;
} else {
return n + my_array_sum(n - 1, xs + 1);
}
}
int main(int argc, char **argv)
{
const int xs[] = {1, 2, 3, 4, 5, 6, 7, 8};
const int n = 8;
const int sum = my_array_sum(n, xs);
return sum;
}
在MinGW/MSys 64位环境下,通过以下命令(
gcc test1.c -o test1.obj -c -O3
, objdump -D test1.obj
)生成如下目标代码:0000000000000000 <main>:
0: 48 83 ec 28 sub $0x28,%rsp
4: e8 00 00 00 00 callq 9 <main+0x9>
9: b8 24 00 00 00 mov $0x24,%eax
e: 48 83 c4 28 add $0x28,%rsp
12: c3 retq
在 Linux Mint 64 位系统中:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: b8 24 00 00 00 mov $0x24,%eax
9: c3 retq
以下是链表的版本(虽然这是个不好的词汇,但它没有使用 malloc,且编译器可以访问关于链表的所有信息):
/**
* @file test2.c
*/
typedef struct Cons {
const int x;
const struct Cons * const next;
} Cons;
static inline int cons_sum(const Cons c) {
if (c.next == 0) {
return c.x;
} else {
return c.x + cons_sum(*(c.next));
}
}
int main(int argc, char **argv)
{
// | To build the stack-local linked list, with tail s0 and head h.
const Cons s0 = {8, 0};
const Cons s1 = {7, &s0};
const Cons s2 = {6, &s1};
const Cons s3 = {5, &s2};
const Cons s4 = {4, &s3};
const Cons s5 = {3, &s4};
const Cons s6 = {2, &s5};
const Cons h = {1, &s6};
// | To produce a return value (expect 36).
const int sum = cons_sum(h);
return sum;
}
执行以下命令(gcc test2.c -o test2.obj -c -O3
, objdump -D test2.obj
),将产生如下目标代码:
在MinGW/MSys 64位操作系统中:
0000000000000000 <main>:
0: 48 81 ec 88 00 00 00 sub $0x88,%rsp
7: e8 00 00 00 00 callq c <main+0xc>
c: 48 8d 44 24 20 lea 0x20(%rsp),%rax
11: 48 8d 54 24 70 lea 0x70(%rsp),%rdx
16: b9 02 00 00 00 mov $0x2,%ecx
1b: 48 89 44 24 38 mov %rax,0x38(%rsp)
20: 48 8d 44 24 30 lea 0x30(%rsp),%rax
25: 48 89 44 24 48 mov %rax,0x48(%rsp)
2a: 48 8d 44 24 40 lea 0x40(%rsp),%rax
2f: 48 89 44 24 58 mov %rax,0x58(%rsp)
34: 48 8d 44 24 50 lea 0x50(%rsp),%rax
39: 48 89 44 24 68 mov %rax,0x68(%rsp)
3e: 48 8d 44 24 60 lea 0x60(%rsp),%rax
43: c7 44 24 20 08 00 00 movl $0x8,0x20(%rsp)
4a: 00
4b: 48 c7 44 24 28 00 00 movq $0x0,0x28(%rsp)
52: 00 00
54: c7 44 24 30 07 00 00 movl $0x7,0x30(%rsp)
5b: 00
5c: c7 44 24 40 06 00 00 movl $0x6,0x40(%rsp)
63: 00
64: c7 44 24 50 05 00 00 movl $0x5,0x50(%rsp)
6b: 00
6c: c7 44 24 60 04 00 00 movl $0x4,0x60(%rsp)
73: 00
74: c7 44 24 70 03 00 00 movl $0x3,0x70(%rsp)
7b: 00
7c: 48 89 44 24 78 mov %rax,0x78(%rsp)
81: e8 00 00 00 00 callq 86 <main+0x86>
86: 83 c0 01 add $0x1,%eax
89: 48 81 c4 88 00 00 00 add $0x88,%rsp
90: c3 retq
在Linux Mint 64位操作系统中:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 48 83 ec 78 sub $0x78,%rsp
8: b9 01 00 00 00 mov $0x1,%ecx
d: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
14: 00 00
16: 48 89 44 24 68 mov %rax,0x68(%rsp)
1b: 31 c0 xor %eax,%eax
1d: 48 89 e0 mov %rsp,%rax
20: 48 8d 54 24 50 lea 0x50(%rsp),%rdx
25: c7 04 24 08 00 00 00 movl $0x8,(%rsp)
2c: 48 89 44 24 18 mov %rax,0x18(%rsp)
31: 48 8d 44 24 10 lea 0x10(%rsp),%rax
36: 48 89 44 24 28 mov %rax,0x28(%rsp)
3b: 48 8d 44 24 20 lea 0x20(%rsp),%rax
40: 48 89 44 24 38 mov %rax,0x38(%rsp)
45: 48 8d 44 24 30 lea 0x30(%rsp),%rax
4a: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
51: 00 00
53: c7 44 24 10 07 00 00 movl $0x7,0x10(%rsp)
5a: 00
5b: c7 44 24 20 06 00 00 movl $0x6,0x20(%rsp)
62: 00
63: c7 44 24 30 05 00 00 movl $0x5,0x30(%rsp)
6a: 00
6b: c7 44 24 40 04 00 00 movl $0x4,0x40(%rsp)
72: 00
73: c7 44 24 50 03 00 00 movl $0x3,0x50(%rsp)
7a: 00
7b: 48 89 44 24 48 mov %rax,0x48(%rsp)
80: 48 8d 44 24 40 lea 0x40(%rsp),%rax
85: 48 89 44 24 58 mov %rax,0x58(%rsp)
8a: b8 02 00 00 00 mov $0x2,%eax
8f: 90 nop
90: 89 c6 mov %eax,%esi
92: 8b 02 mov (%rdx),%eax
94: 48 8b 52 08 mov 0x8(%rdx),%rdx
98: 01 f1 add %esi,%ecx
9a: 48 85 d2 test %rdx,%rdx
9d: 75 f1 jne 90 <main+0x90>
9f: 01 c8 add %ecx,%eax
a1: 48 8b 7c 24 68 mov 0x68(%rsp),%rdi
a6: 64 48 33 3c 25 28 00 xor %fs:0x28,%rdi
ad: 00 00
af: 75 05 jne b6 <main+0xb6>
b1: 48 83 c4 78 add $0x78,%rsp
b5: c3 retq
b6: e8 00 00 00 00 callq bb <main+0xbb>
我尝试在C和C++中编译上述代码,两者都给出了大致相同的反汇编结果。
标准是否保证即使是严格常量的链表变体(在堆栈上,因为我可以看到全局链表不会被优化)也不会像简单的数组那样得到优化,还是说这只是编译器的当前状态,并且在将来可能会被优化?或者我是否忘记了启用此类优化的某种标志?
constexpr
/consteval
关键字,以请求编译器在编译时进行计算。例如:https://godbolt.org/z/z1c3P33q1 - user17732522test
/jne 90
部分所看到的那样。尽管缺少尾递归。将递归转换为迭代是编译器通常非常有用的优化,因为递归对于小函数特别是具有潜在高递归深度的函数来说效率较低。 - Peter Cordes