GCC内存对齐Pragma

Question

GCC内存对齐Pragma

9

gcc是否有内存对齐的编译指令，类似于英特尔编译器中的#pragma vector aligned？我想要告诉编译器使用对齐加载/存储指令来优化特定的循环。为避免可能的混淆，这不是关于结构体打包的问题。

例如：

#if defined (__INTEL_COMPILER)
#pragma vector aligned
#endif
        for (int a = 0; a < int(N); ++a) {
            q10 += Ix(a,0,0)*Iy(a,1,1)*Iz(a,0,0);
            q11 += Ix(a,0,0)*Iy(a,0,1)*Iz(a,1,0);
            q12 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,0,1);
            q13 += Ix(a,1,0)*Iy(a,0,0)*Iz(a,0,1);
            q14 += Ix(a,0,0)*Iy(a,1,0)*Iz(a,0,1);
            q15 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,1,1);
        }

谢谢

- Anycorn

3个回答

6

我尝试了您的解决方案，并在g++版本4.5.2（Ubuntu和Windows均可）上测试，但它并未对循环进行矢量化。

如果移除对齐属性，则使用未对齐加载方式对循环进行矢量化。

如果将函数内联以便可以直接使用指针访问数组，那么就会使用对齐加载方式对其进行矢量化。

在这两种情况下，对齐属性都会阻止矢量化。这是具有讽刺意味的：本应使矢量化变得更容易的“aligned_double *x”却起到了反作用。

请问哪个编译器为您报告了矢量化循环？我怀疑这不是gcc编译器？

- A Fog

4

gcc有内存对齐的编译指示吗，类似于#pragma vector aligned

看起来较新版本的GCC有__builtin_assume_aligned函数：

Built-in Function: void * __builtin_assume_aligned (const void *exp, size_t align, ...)

This function returns its first argument, and allows the compiler to assume that the returned pointer is at least align bytes aligned. This built-in can have either two or three arguments, if it has three, the third argument should have integer type, and if it is nonzero means misalignment offset. For example:
void *x = __builtin_assume_aligned (arg, 16);
means that the compiler can assume x, set to arg, is at least 16-byte aligned, while:
void *x = __builtin_assume_aligned (arg, 32, 8);
means that the compiler can assume for x, set to arg, that (char *) x - 8 is 32-byte aligned.

根据2010年左右Stack Overflow上的一些问题和答案，似乎GCC 3和早期GCC 4中没有内置功能。但我不知道截止点在哪里。

- jww

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dietrich Epp · Accepted Answer

您可以使用typedef创建一个过度对齐的类型，然后声明指向该类型的指针，以告诉GCC指针指向对齐内存。

这对于gcc有帮助，但不适用于clang7.0或ICC19，在Godbolt上查看它们发出的x86-64非AVX汇编代码。（只有GCC将加载折叠到内存操作数中以进行mulps，而不是使用单独的movups）。如果您想要在除GCC本身以外的GNU C编译器中传达可移植的对齐承诺，则必须使用__builtin_assume_aligned。

来自 http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html

typedef double aligned_double __attribute__((aligned (16)));
// Note: sizeof(aligned_double) is 8, not 16
void some_function(aligned_double *x, aligned_double *y, int n)
{
    for (int i = 0; i < n; ++i) {
        // math!
    }
}

这不会让 aligned_double 变成 16 字节宽。这只是将其对齐到 16 字节边界，或者说数组中的第一个元素将会被对齐。查看我的电脑的反汇编代码，一旦使用对齐指令，我就开始看到很多矢量运算。我现在正在使用 Power 架构的计算机，因此这是 altivec 代码，但我认为这正是您想要的。（注：我测试时没有使用双精度浮点数，因为 altivec 不支持双精度浮点数。）您可以在这里查看使用类型属性的其他自动向量化示例：http://gcc.gnu.org/projects/tree-ssa/vectorization.html