如何阻止Clang在使用模板时过度展开嵌套循环？

Question

如何阻止Clang在使用模板时过度展开嵌套循环？

c++clangcompiler-optimizationtemplate-meta-programmingloop-unrolling

21

考虑这段代码：

#include <iostream>
typedef long xint;
template<int N>
struct foz {
    template<int i=0>
    static void foo(xint t) {
        for (int j=0; j<10; ++j) {
            foo<i+1> (t+j);
        }
    }
    template<>
    static void foo<N>(xint t) {
        std::cout << t;
    }

};

int main() {
    foz<8>::foo<0>(0);
}

在使用 clang++ -O0 进行编译时，它可以在几秒钟内完成编译，然后运行 4 秒钟。

然而，使用 clang++ -O2 进行编译需要很长时间和大量内存。在 Compiler Explorer 上可以看到，将 8 更改为较小的值后，它会完全展开循环。

我不打算完全不优化，但是想让它不递归，就像嵌套循环应该表现的那样。我需要做些什么吗？

- l4m2

2

可能值得提交一个错误报告，显然需要调整一些内联启发式算法。 - Quimby

3个回答

1

为了使其非递归化，您可以使用数组作为索引：

static bool increase(std::array<int, N>& a)
{
    for (auto rit = std::rbegin(a); rit != std::rend(a); ++rit) {
        if (++*rit == 10) {
            *rit = 0;
        } else {
            return true;
        }
    }
    return false;
}

static void foo(xint t) {
    std::array<int, N> indexes{};

    do {
        std::cout << std::accumulate(std::begin(indexes), std::end(indexes), 0);
    } while (increase(indexes));
}

演示

- Jarod42

0

最简单的解决方案是使用noinline函数属性标记有问题的函数，该属性也受到其他几个C++编译器（例如GNU g++）的支持：

    template<int i=0>
    static void foo(xint t)  __attribute__((__noinline__)) {

这会告诉编译器的优化器不要将对该函数的调用内联。

- Dan Bonachea

1

noinline 保留递归，但我更喜欢嵌套循环。 - l4m2

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Iwa · Accepted Answer

循环展开优化可以被禁用；请参见编译器资源管理器。生成的代码是非递归的，并且以嵌套循环的形式表达。

#pragma nounroll
for (int j=0; j<10; ++j) {
    foo<i+1> (t+j);
}

此外，您还可以手动调整展开而不是禁用它。展开8次生成的代码类似于循环8次的代码。（编译器资源管理器）

#pragma unroll 8
for (int j=0; j<10; ++j) {
    foo<i+1> (t+j);
}