STL容器带有一个allocator模板参数,可用于对齐其内部缓冲区。指定的allocator类型必须实现至少allocate
、deallocate
和value_type
。
与这些答案相比,此分配器的实现避免了依赖于平台的对齐malloc调用。相反,它使用C++17对齐的new
操作符。
此处是在godbolt上的完整示例。
#include <limits>
#include <new>
template<typename ElementType,
std::size_t ALIGNMENT_IN_BYTES = 64>
class AlignedAllocator
{
private:
static_assert(
ALIGNMENT_IN_BYTES >= alignof( ElementType ),
"Beware that types like int have minimum alignment requirements "
"or access will result in crashes."
);
public:
using value_type = ElementType;
static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES };
template<class OtherElementType>
struct rebind
{
using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>;
};
public:
constexpr AlignedAllocator() noexcept = default;
constexpr AlignedAllocator( const AlignedAllocator& ) noexcept = default;
template<typename U>
constexpr AlignedAllocator( AlignedAllocator<U, ALIGNMENT_IN_BYTES> const& ) noexcept
{}
[[nodiscard]] ElementType*
allocate( std::size_t nElementsToAllocate )
{
if ( nElementsToAllocate
> std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) {
throw std::bad_array_new_length();
}
auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType );
return reinterpret_cast<ElementType*>(
::operator new[]( nBytesToAllocate, ALIGNMENT ) );
}
void
deallocate( ElementType* allocatedPointer,
[[maybe_unused]] std::size_t nBytesAllocated )
{
::operator delete[]( allocatedPointer, ALIGNMENT );
}
};
这个分配器可以像这样使用:
#include <iostream>
#include <stdexcept>
#include <vector>
template<typename T, std::size_t ALIGNMENT_IN_BYTES = 64>
using AlignedVector = std::vector<T, AlignedAllocator<T, ALIGNMENT_IN_BYTES> >;
int
main()
{
AlignedVector<int, 1024> buffer( 3333 );
if ( reinterpret_cast<std::uintptr_t>( buffer.data() ) % 1024 != 0 ) {
std::cerr << "Vector buffer is not aligned!\n";
throw std::logic_error( "Faulty implementation!" );
}
std::cout << "Successfully allocated an aligned std::vector.\n";
return 0;
}
_mm256_loadu_ps(&vec[i])
,以避免出现段错误或缓存行分裂导致的潜在减速情况。请注意,默认调优选项下,GCC会将未对齐256位加载/存储拆分为vmovups xmm / vinsertf128。因此,如果您关心代码在GCC上编译时是否有人忘记使用“-mtune = ...”或“-march =”选项,那么使用_mm256_load而不是loadu是有优势的。 - Peter Cordesboost::alignment::aligned_allocator
的代码。然后我可以使用std::vector<T, aligned_allocator<float>>
分配向量。这确实使得普通的std::vectors
不能直接兼容这种类型的对齐向量,但你总是可以想办法解决这个问题。 - Prunus Persica