我该如何从GPU内存地址创建一个PyCUDA GPUArray？

Question

我该如何从GPU内存地址创建一个PyCUDA GPUArray？

3

我正在使用PyTorch，并希望在PyCUDA的帮助下对Tensor数据进行一些算术运算。我可以通过t.data_ptr()获取cuda tensor t的内存地址。我能否利用这个地址和我的大小和数据类型的知识来初始化一个GPUArray？我希望避免复制数据，但这也是一种选择。

- oarfish

不，我不相信你能在不自己编写一些底层PyCUDA代码的情况下做到这一点。 - talonmies

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- oarfish · Accepted Answer

原文：It turns out this is possible. We need a pointer do the data, which needs some additional capabilities:

翻译：原来这是可能的。我们需要一个指针指向数据，它需要一些额外的功能。

class Holder(PointerHolderBase):

    def __init__(self, tensor):
        super().__init__()
        self.tensor = tensor
        self.gpudata = tensor.data_ptr()

    def get_pointer(self):
        return self.tensor.data_ptr()

    def __int__(self):
        return self.__index__()

    # without an __index__ method, arithmetic calls to the GPUArray backed by this pointer fail
    # not sure why, this needs to return some integer, apparently
    def __index__(self):
        return self.gpudata

我们可以使用这个类来实例化GPUArray。该代码使用的是Reikna数组，它是一个子类，但也应该适用于pycuda数组。

def tensor_to_gpuarray(tensor, context=pycuda.autoinit.context):
    '''Convert a :class:`torch.Tensor` to a :class:`pycuda.gpuarray.GPUArray`. The underlying
    storage will be shared, so that modifications to the array will reflect in the tensor object.
    Parameters
    ----------
    tensor  :   torch.Tensor
    Returns
    -------
    pycuda.gpuarray.GPUArray
    Raises
    ------
    ValueError
        If the ``tensor`` does not live on the gpu
    '''
    if not tensor.is_cuda:
        raise ValueError('Cannot convert CPU tensor to GPUArray (call `cuda()` on it)')
    else:
        thread = cuda.cuda_api().Thread(context)
    return reikna.cluda.cuda.Array(thread, tensor.shape, dtype=torch_dtype_to_numpy(tensor.dtype), base_data=Holder(tensor))

我们可以用这段代码回溯。我还没有找到一种不复制数据的方法。

def gpuarray_to_tensor(gpuarray, context=pycuda.autoinit.context):
    '''Convert a :class:`pycuda.gpuarray.GPUArray` to a :class:`torch.Tensor`. The underlying
    storage will NOT be shared, since a new copy must be allocated.
    Parameters
    ----------
    gpuarray  :   pycuda.gpuarray.GPUArray
    Returns
    -------
    torch.Tensor
    '''
    shape = gpuarray.shape
    dtype = gpuarray.dtype
    out_dtype = numpy_dtype_to_torch(dtype)
    out = torch.zeros(shape, dtype=out_dtype).cuda()
    gpuarray_copy = tensor_to_gpuarray(out, context=context)
    byte_size = gpuarray.itemsize * gpuarray.size
    pycuda.driver.memcpy_dtod(gpuarray_copy.gpudata, gpuarray.gpudata, byte_size)
    return out

旧答案

from pycuda.gpuarray import GPUArray


def torch_dtype_to_numpy(dtype):
    dtype_name = str(dtype)[6:]     # remove 'torch.'
    return getattr(np, dtype_name)


def tensor_to_gpuarray(tensor):
    if not tensor.is_cuda:
        raise ValueError('Cannot convert CPU tensor to GPUArray (call `cuda()` on it)')
    else:
        array = GPUArray(tensor.shape, dtype=torch_dtype_to_numpy(tensor.dtype),
                         gpudata=tensor.data_ptr())
        return array.copy()

很不幸，将int作为“gpudata”关键字（或者像在pytorch论坛中建议的那样，将“pycuda.driver.PointerHolderBase”的子类型作为关键字）似乎表面上可行，但许多操作会失败，并出现看似无关的错误。然而，复制该数组似乎可以将其转换为可用格式。我认为这与“gpudata”成员应该是一个“pycuda.driver.DeviceAllocation”对象有关，但似乎无法从Python实例化它。

现在如何从原始数据返回到张量是另一回事。