严格别名、-ffast-math和SSE

Question

严格别名、-ffast-math和SSE

14

考虑以下程序：

#include <iostream>
#include <cmath>
#include <cstring>
#include <xmmintrin.h>

using namespace std;

int main()
{
    // 4 float32s.
    __m128 nans;
    // Set them all to 0xffffffff which should be NaN.
    memset(&nans, 0xff, 4*4);

    // cmpord should return a mask of 0xffffffff for any non-NaNs, and 0x00000000 for NaNs.
    __m128 mask = _mm_cmpord_ps(nans, nans);
    // AND the mask with nans to zero any of the nans. The result should be 0x00000000 for every component.
    __m128 z = _mm_and_ps(mask, nans);

    cout << z[0] << " " << z[1] << " " << z[2] << " " << z[3] << endl;

    return 0;
}

如果我使用Apple Clang 7.0.2编译，有时加上-ffast-math，有时不加，我会得到预期的输出0 0 0 0:

$ clang --version
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin14.5.0
Thread model: posix

$ clang test.cpp -o test
$ ./test
0 0 0 0 

$ clang test.cpp -ffast-math -o test
$ ./test 
0 0 0 0

然而，在更新到8.1.0之后（很抱歉，我不知道这实际上对应哪个版本的Clang - Apple不再发布相关信息），-ffast-math似乎破坏了此功能：

$ clang --version
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ clang test.cpp -o test
$ ./test
0 0 0 0 

$ clang test.cpp -ffast-math -o test
$ ./test 
nan nan nan nan

我怀疑这是由于严格别名规则之类的原因。有人可以解释一下这种行为吗？

编辑：我忘记提到如果你这样做 nans = { std::nanf(nullptr), ... 它能正常工作。

另外，在 godbolt 上查看，似乎 Clang 3.8.1 和 Clang 3.9 之间的行为发生了变化 - 后者删除了 cmpordps 指令。GCC 7.1 似乎保留它。

- Timmmm

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Cornstalks · Accepted Answer

这不是一个严格的别名问题。如果您阅读“-ffast-math”文档，您将看到您的问题：

启用快速数学模式。这将定义__FAST_MATH__预处理器宏，并允许编译器对浮点数学进行激进的、潜在的有损假设。其中包括：

[...]

浮点运算的操作数不等于NaN和Inf，以及

[...]

-ffast-math 允许编译器假定浮点数永远不会是 NaN（因为它设置了 -ffinite-math-only 选项）。由于clang试图匹配gcc的选项，我们可以从 GCC的选项文档中稍微了解一下-ffinite-math-only 的作用：

允许浮点算术优化假设参数和结果不是NaN或+-Infs。

此选项不应由任何-O选项打开，因为它可能导致依赖于IEEE或ISO规则/规范的精确实现的程序产生错误输出。

因此，如果您的代码需要使用 NaN，则不能使用 -ffast-math 或 -ffinite-math-only。否则，您就面临着优化器破坏代码的风险，正如您在这里看到的那样。