使用PyTorch获取GPU的总空闲内存和可用内存

Question

使用PyTorch获取GPU的总空闲内存和可用内存

54

我正在使用Google Colab提供的免费GPU进行实验，想知道可以用多少GPU内存进行测试。通过调用torch.cuda.memory_allocated()可以返回当前已经占用的GPU内存，但是如何确定PyTorch中总共可用的内存呢？

- Hari Prasad

3个回答

27

在最近的PyTorch版本中，您还可以使用torch.cuda.mem_get_info：

https://pytorch.org/docs/stable/generated/torch.cuda.mem_get_info.html#torch.cuda.mem_get_info

torch.cuda.mem_get_info()

它返回一个元组，其中第一个元素是可用的自由内存使用量，第二个元素是总可用内存。

- Iman

4

这个回答比被接受的回答更好（使用 total_memory+ 保留/分配），因为它在其他进程/用户共享GPU并占用内存时提供正确的数字。 - krassowski

1

在旧版本的PyTorch中存在缺陷，它会忽略设备参数并始终返回当前设备信息。解决方法是使用上下文管理器：

with torch.cuda.device(device):
     info = torch.cuda.mem_get_info()

参见：https://github.com/pytorch/pytorch/issues/76224 - אלימלך שרייבר

示例用法请 - Nathan B

@NathanB 添加了使用示例。 - Iman

4

这对我很有用！

def get_memory_free_MiB(gpu_index):
    pynvml.nvmlInit()
    handle = pynvml.nvmlDeviceGetHandleByIndex(int(gpu_index))
    mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
    return mem_info.free // 1024 ** 2

- Peter Pack

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- prosti · Accepted Answer

PyTorch可以为您提供总信息、保留信息和分配信息：

t = torch.cuda.get_device_properties(0).total_memory
r = torch.cuda.memory_reserved(0)
a = torch.cuda.memory_allocated(0)
f = r-a  # free inside reserved

Python绑定到NVIDIA可以为您提供整个GPU的信息（在本例中，0表示第一个GPU设备）：

from pynvml import *
nvmlInit()
h = nvmlDeviceGetHandleByIndex(0)
info = nvmlDeviceGetMemoryInfo(h)
print(f'total    : {info.total}')
print(f'free     : {info.free}')
print(f'used     : {info.used}')

_{pip install pynvml}

您可以使用nvidia-smi来获取内存信息。您可以使用nvtop，但是此工具需要从源代码安装（在撰写本文时）。另一个检查内存的工具是gpustat（pip3 install gpustat）。

如果您想使用C++ cuda：

include <iostream>
#include "cuda.h"
#include "cuda_runtime_api.h"
  
using namespace std;
  
int main( void ) {
    int num_gpus;
    size_t free, total;
    cudaGetDeviceCount( &num_gpus );
    for ( int gpu_id = 0; gpu_id < num_gpus; gpu_id++ ) {
        cudaSetDevice( gpu_id );
        int id;
        cudaGetDevice( &id );
        cudaMemGetInfo( &free, &total );
        cout << "GPU " << id << " memory: free=" << free << ", total=" << total << endl;
    }
    return 0;
}