告知 C 或 C++ 编译器循环次数为 8 的倍数。

Question

告知 C 或 C++ 编译器循环次数为 8 的倍数。

6

我希望在使用gcc 11.1（编译选项为-O3 -mavx -std=c++17）的情况下，用C++编写以下函数。

void f( float * __restrict__ a, float * __restrict__ b, float * __restrict__ c, int64_t n) {
    for (int64_t i = 0; i != n; ++i) {
        a[i] = b[i] + c[i];
    }
}

这将生成大约60行汇编代码，其中许多处理n不是8的倍数的情况。https://godbolt.org/z/61MYPG7an

我知道n始终是8的倍数。我可以更改此代码的一种方法是使用for (int64_t i = 0; i != (n / 8 * 8); ++i)替换for (int64_t i = 0; i != n; ++i)。这只会生成大约20个汇编指令。https://godbolt.org/z/vhvdKMfE9

然而，在第二个godbolt链接的第5行，存在一个指令用于清零n的最低三位。如果有一种方法可以告知编译器n将始终是8的倍数，则可以省略此指令而不会改变行为。是否有人知道在任何c或c++编译器上（特别是在gcc或clang上）如何做到这一点？在我的情况下，实际上并不重要，但我很感兴趣，也不确定该去哪里寻找答案。

- Henry Heffan

你已经有了 #include <immintrin.h>，为什么不直接使用这些内嵌函数呢？ - Vlad Feinstein

嗨，弗拉德，那是从其他实验中剩下的。我这里不需要它。抱歉！ - Henry Heffan

下面的答案更简洁，但我想说的是，您可以每次循环8个元素，并使用内在的AVX函数一次性处理它们。 - Vlad Feinstein

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- HTNW · Accepted Answer

使用__builtin_unreachable声明假设

void f(float *__restrict__ a, float *__restrict__ b, float *__restrict__ c, int64_t n) {
    if(n % 8 != 0) __builtin_unreachable(); // control flow cannot reach this branch so the condition is not necessary and is optimized out
    for (int64_t i = 0; i != n; ++i) { // if control flow reaches this point n is a multiple of 8
        a[i] = b[i] + c[i];
    }
}

这将生成更短的代码。