为什么GCC未能自动向量化这个循环？

Question

为什么GCC未能自动向量化这个循环？

c++gccvectorization

11

我试图优化一个占据程序大部分计算时间的循环。

但是当我打开-O3 -ffast-math -ftree-vectorizer-verbose=6自动矢量化时，GCC输出无法矢量化该循环。

我正在使用GCC 4.4.5

代码：

/// Find the point in the path with the largest v parameter
void prediction::find_knife_edge(
    const float * __restrict__ const elevation_path,
    float * __restrict__ const diff_path,
    const float path_res,
    const unsigned a,
    const unsigned b,
    const float h_a,
    const float h_b,
    const float f,
    const float r_e,
) const
{
    float wavelength = (speed_of_light * 1e-6f) / f;

    float d_ab = path_res * static_cast<float>(b - a);

    for (unsigned n = a + 1; n <= b - 1; n++)
    {
        float d_an = path_res * static_cast<float>(n - a);
        float d_nb = path_res * static_cast<float>(b - n);

        float h = elevation_path[n] + (d_an * d_nb) / (2.0f * r_e) - (h_a * d_nb + h_b * d_an) / d_ab;
        float v = h * std::sqrt((2.0f * d_ab) / (wavelength * d_an * d_nb));

        diff_path[n] = v;
    }
}

来自GCC的消息：

note: not vectorized: number of iterations cannot be computed.
note: not vectorized: unhandled data-ref

在有关自动向量化的页面（http://gcc.gnu.org/projects/tree-ssa/vectorization.html）中，它指出支持未知的循环边界。

如果我将for循环替换为

for (unsigned n = 0; n <= 100; n++)

然后它将其向量化。

我做错了什么？

关于这些消息的详细文档以及GCC自动向量化的内部机制的缺乏详细说明真是令人恼火。

编辑：感谢David，我将循环更改为以下内容：

 for (unsigned n = a + 1; n < b; n++)

现在GCC尝试对循环进行向量化，但出现了以下错误：

 note: not vectorized: unhandled data-ref
 note: Alignment of access forced using peeling.
 note: Vectorizing an unaligned access.
 note: vect_model_induction_cost: inside_cost = 1, outside_cost = 2 .
 note: not vectorized: relevant stmt not supported: D.76777_65 = (float) n_34;

"D.76777_65 = (float) n_34;" 的意思是什么？

- ljbade

据我所知，tree-ssa是一种新的工具，旨在克服gcc向量化器的局限性，我认为它目前还没有在主要的gcc分支上使用。 - Ben Voigt

6

将 n <= b - 1 改为 n < b。 - David Schwartz

我非常确定我的GCC有tree-ssa，否则它会抱怨不支持-ftree-vectorizer-verbose标志。 - ljbade

David：那个方法可行...但是我现在又遇到了一个新的错误...我会更新问题。 - ljbade

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- David Schwartz · Accepted Answer

我可能有点混淆了细节，但这是需要重新构建循环以实现矢量化的方法。诀窍是预计算迭代次数，并从0迭代到该数字的前一个数字。不要更改for语句。您可能需要修复其前面的两行和循环顶部的两行代码。它们是大致正确的。 ;)

const unsigned it=(b-a)-1;
const unsigned diff=b-a;
for (unsigned n = 0; n < it; n++)
{
    float d_an = path_res * static_cast<float>(n);
    float d_nb = path_res * static_cast<float>(diff - n);

    float h = elevation_path[n] + (d_an * d_nb) / (2.0f * r_e) - (h_a * d_nb + h_b * d_an) / d_ab;
    float v = h * sqrt((2.0f * d_ab) / (wavelength * d_an * d_nb));

    diff_path[n] = v;
}