C++11中推荐的内存对齐方式是什么？

Question

C++11中推荐的内存对齐方式是什么？

c++c++11dynamic-memory-allocationmemory-alignment

72

我正在开发一个单生产者单消费者环形缓冲区实现。我有两个要求：

将单个堆分配的环形缓冲区实例对齐到缓存行。
将环形缓冲区内的字段对齐到缓存行（以防止伪共享）。

我的类大致如下：

#define CACHE_LINE_SIZE 64  // To be used later.

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
public:
  ....

private:
  std::atomic<int64_t> publisher_sequence_ ;
  int64_t cached_consumer_sequence_;
  T* events_;
  std::atomic<int64_t> consumer_sequence_;  // This needs to be aligned to a cache line.

};

首先，让我解决第一个问题，即对类的单个堆分配实例进行对齐。有几种方法：

Use the c++ 11 alignas(..) specifier:

template<typename T, uint64_t num_events>
class alignas(CACHE_LINE_SIZE) RingBuffer {
public:
  ....

private:
  // All the private fields.

};

Use posix_memalign(..) + placement new(..) without altering the class definition. This suffers from not being platform independent:

void* buffer;
if (posix_memalign(&buffer, 64, sizeof(processor::RingBuffer<int, kRingBufferSize>)) != 0) {
    perror("posix_memalign did not work!");
    abort();
}
// Use placement new on a cache aligned buffer.
auto ring_buffer = new(buffer) processor::RingBuffer<int, kRingBufferSize>();

Use the GCC/Clang extension __attribute__ ((aligned(#)))

template<typename T, uint64_t num_events>
class RingBuffer {
public:
  ....

private:
  // All the private fields.

} __attribute__ ((aligned(CACHE_LINE_SIZE)));

I tried to use the C++ 11 standardized aligned_alloc(..) function instead of posix_memalign(..) but GCC 4.8.1 on Ubuntu 12.04 could not find the definition in stdlib.h

这些方法是否都能保证达到同样的效果？我的目标是缓存行对齐，因此任何在对齐方面有限制（比如双字）的方法都不会起作用。平台独立性是使用标准化的alignas(..)的次要目标。

我不清楚alignas(..)和__attribute__((aligned(#)))是否存在某些限制，可能低于机器上的缓存行。我无法再重现这个问题，但在打印地址时，我认为alignas(..)并不总是获得64字节对齐的地址。相反，posix_memalign(..)似乎总是有效的。但我不能再次重现这个问题，所以可能是我弄错了。

第二个目标是将类/结构体中的字段对齐到缓存行。我这样做是为了防止虚假共享。我尝试过以下方式：

Use the C++ 11 alignas(..) specifier:

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
  public:
  ...
  private:
    std::atomic<int64_t> publisher_sequence_ ;
    int64_t cached_consumer_sequence_;
    T* events_;
    std::atomic<int64_t> consumer_sequence_ alignas(CACHE_LINE_SIZE);
};

Use the GCC/Clang extension __attribute__ ((aligned(#)))

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
  public:
  ...
  private:
    std::atomic<int64_t> publisher_sequence_ ;
    int64_t cached_consumer_sequence_;
    T* events_;
    std::atomic<int64_t> consumer_sequence_ __attribute__ ((aligned (CACHE_LINE_SIZE)));
};

这两种方法似乎都将consumer_sequence与对象开头后的64字节地址对齐，因此consumer_sequence是否缓存对齐取决于对象本身是否缓存对齐。我的问题是 - 是否有更好的方法来做到同样的效果？

编辑：原因是aligned_alloc在我的机器上无法工作，因为我使用的是eglibc 2.15（Ubuntu 12.04）。它在较新版本的eglibc上运行良好。

从man页面可以看出：函数aligned_alloc()是在glibc 2.16中添加的。

这使得它对我来说几乎没有用处，因为我不能要求使用如此新的eglibc / glibc版本。

- Rajiv

6

好问题，可以参考Michael Spencer在BoostCon 2013的演讲。我认为你无法对齐超过16字节（因此标准不支持64字节缓存行甚至更大对虚拟内存页面的对齐）。 - TemplateRex

@TemplateRex 谢谢你提供的链接。这个讲座似乎很相关，点赞 +1。 - Rajiv

4个回答

10

你问题的答案是std::aligned_storage。它可用于顶层和类的各个成员。

- rubenvb

3

但是它与alignas有类似的限制（直到c++17为止，最多16字节/平台相关限制）。 - kwesolowski

4

经过更多的研究，我的想法是：

像@TemplateRex指出的那样，似乎没有一种标准的方法来对齐超过16字节。因此，即使我们使用了标准化的alignas(..)，除非对齐边界小于或等于16字节，否则不能保证对齐。我将不得不验证它是否在目标平台上按预期工作。
__attribute ((aligned(#)))或alignas(..)不能用于对齐堆分配的对象，正如我所怀疑的那样，即new()不会使用这些注释。它们似乎适用于静态对象或带有(1)中注意事项的堆栈分配。

posix_memalign(..)(非标准)或aligned_alloc(..)(标准化，但无法在GCC 4.8.1上工作) +放置new(..)似乎是解决方案。当我需要平台无关的代码时，我的解决方案是编译器特定的宏:)
结构/类字段的对齐似乎可以使用__attribute ((aligned(#)))和alignas()，如答案所述。同样，我认为(1)中关于对齐保证的注意事项仍然存在。

因此，我的当前解决方案是使用posix_memalign(..)+放置new(..)来对齐我类的堆分配实例，因为我的目标平台现在只是Linux。我还使用alignas(..)来对齐字段，因为它是标准化的，并且至少在Clang和GCC上可用。如果有更好的答案出现，我很乐意进行更改。

- Rajiv

在实践中，alignas(64) 或者更高的值是有效的。 - Peter Cordes

2

我不确定使用new操作符对齐内存是否是最好的方法，但它绝对是非常简单的！

这就是GCC 6.1.0中线程污点分析器传递中执行的方式。

#define ALIGNED(x) __attribute__((aligned(x)))

static char myarray[sizeof(myClass)] ALIGNED(64) ;
var = new(myarray) myClass;

在 sanitizer_common/sanitizer_internal_defs.h 中，也写到了：

// Please only use the ALIGNED macro before the type.
// Using ALIGNED after the variable declaration is not portable!

我不知道为什么在变量声明后使用了ALIGNED。但这是另一回事。

- Hugo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Glenn Teitelbaum · Accepted Answer

很遗憾，我找到的最好办法是分配额外的空间，然后使用“对齐”的部分。因此，RingBuffer new 可以请求额外的64字节，然后返回其中第一个64字节对齐的部分。这会浪费空间，但会给您所需的对齐方式。您可能需要设置返回内存之前的实际分配地址以取消分配。

[Memory returned][ptr to start of memory][aligned memory][extra memory]

（假设没有从RingBuffer继承）类似于：

void * RingBuffer::operator new(size_t request)
{
     static const size_t ptr_alloc = sizeof(void *);
     static const size_t align_size = 64;
     static const size_t request_size = sizeof(RingBuffer)+align_size;
     static const size_t needed = ptr_alloc+request_size;

     void * alloc = ::operator new(needed);
     void *ptr = std::align(align_size, sizeof(RingBuffer),
                          alloc+ptr_alloc, request_size);

     ((void **)ptr)[-1] = alloc; // save for delete calls to use
     return ptr;  
}

void RingBuffer::operator delete(void * ptr)
{
    if (ptr) // 0 is valid, but a noop, so prevent passing negative memory
    {
           void * alloc = ((void **)ptr)[-1];
           ::operator delete (alloc);
    }
}

对于第二个要求，即RingBuffer具有64字节对齐的数据成员，如果您知道this的起始位置已经对齐，那么您可以填充以强制数据成员对齐。