我之前从未完全理解函数中的堆栈对齐和堆栈上的“对齐加载/存储”之间的区别。
我正在阅读一些PTX代码,我看到了这个:
function()
.local .align 16 .byte stack_memory[200];
// This should mean the stack memory starts at an address aligned to 16 (why would this be necessary?)
load_byte_from_stack reg, [stack_memory+1];
// It seems reading 1 byte is always safe (why?)
load_float32_from_stack reg, [stack_memory+8];
// It also seems that reading 32 bit from an address aligned to 32 bit (4 bytes) is also safe (why??)
load_two_float32_from_stack reg, [stack_memory+12];
// This should not be right (why?)
我的问题在代码中,但关键是:
我不太明白为什么堆栈分配应该对齐到一个地址,以及如果我可以从完全不对齐的地址读取1个字节并从地址只是4的倍数的地方读取float32,那么为什么这很重要。