我对CUDA编程指南4.0的性能指南章节5.3.2.1中的以下陈述感到困惑。
Global memory resides in device memory and device memory is accessed
via 32-, 64-, or 128-byte memory transactions.
These memory transactions must be naturally aligned:Only the 32-, 64- ,
128- byte segments of device memory
that are aligned to their size (i.e. whose first address is a
multiple of their size) can be read or written by memory
transactions.
1) 我对设备内存的理解是,线程对设备内存的访问是未缓存的:因此,如果线程访问内存位置 a[i]
,它只会获取 a[i]
而不是周围的任何值。因此,第一句话似乎与此相矛盾。或者我没有正确理解这里“内存事务”的用法?
2) 第二句话似乎不太清晰。有人能解释一下吗?