CUDA手册规定每个多处理器的32位寄存器数量。这是否意味着:
Double variable takes two registers?
Pointer variable takes two registers? - It has to be more than one register on Fermi with 6 GB memory, right?
If answer to question 2 is yes, it must be better to use less pointer variables and more
int
indices.E. g., this kernel code:
float* p1; // two regs float* p2 = p1 + 1000; // two regs int i; // one reg for ( i = 0; i < n; i++ ) { CODE THAT USES p1[i] and p2[i] }
theoretically requires more registers than this kernel code:
float* p1; // two regs int i; // one reg int j; // one reg for ( i = 0, j = 1000; i < n; i++, j++ ) { CODE THAT USES p1[i] and p1[j] }
i0
、i1
、i2
索引的有限差分代码。通常情况下,我需要从当前点p[i]
向三个方向中的一个方向移动,其中i=i0+i1*stride1+i2*stride2
。因此,如果我引入指针px1=p+1
、py1=p+stride1
、pz1=p+stride2
(可能还有更多-px2=p+2
等),并使用p[i]
、px1[i]
等进行操作,则代码会更清晰。如果优化器无法优化掉所有这些额外的指针,那么这样做会增加寄存器使用量吗? - user2052436