生成用于初始化Pytorch的随机种子的最佳实践是什么？

Question

生成用于初始化Pytorch的随机种子的最佳实践是什么？

pytorch

4

我真正想要的是为数据集和数据加载器提供种子。我正在调整以下代码：

https://gist.github.com/kevinzakka/d33bf8d6c7f06a9d8c76d97a7879f5cb

有人知道如何正确地进行种子吗？在Pytorch中播种的最佳实践是什么。

老实说，我不知道是否有针对GPU和CPU的算法特定方式。我主要关心通用的pytorch，并确保我的代码是“真正随机”的。特别是当它使用GPU时，我猜想...

相关：

我的回答被删除了，以下是它的内容：

我不知道这是否适用于pytorch，但这似乎适用于任何编程语言：

通常在任何编程语言中，最好的随机样本是通过操作系统生成的。在Python中，您可以使用os模块：

random_data = os.urandom(4)

这样，您就可以获得一个密码安全的随机字节序列，您可以将其转换为数字数据类型以用作种子。

seed = int.from_bytes(random_data, byteorder="big")

编辑：代码片段仅适用于Python 3。

当大于4时，我会收到以下错误：

ValueError: 种子必须在0和2**32 - 1之间

RAND_SIZE = 4

- Charlie Parker

1

请问您能否告诉我们官方文档（https://pytorch.org/docs/stable/notes/randomness.html）在这方面不清楚的地方？我可以提出一个PR来使其更加清晰，但目前我没有看到任何问题。 - Berriel

1

@Berriel 看起来从我发布的答案中，我试图做的是获取一个“真正”的随机数。这与可重复性无关，但我发现我想要其他东西，我会更新/让你知道。 - Charlie Parker

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- cookiemonster · Accepted Answer

请看 https://pytorch.org/docs/stable/notes/randomness.html

这是我使用的内容。

def seed_everything(seed=42):
  random.seed(seed)
  os.environ['PYTHONHASHSEED'] = str(seed)
  np.random.seed(seed)
  torch.manual_seed(seed)
  torch.backends.cudnn.deterministic = True
  torch.backends.cudnn.benchmark = False

最后两个参数 (cudnn) 是针对GPU的。

你可以按照以下方式生成一个种子：

def get_truly_random_seed_through_os():
    """
    Usually the best random sample you could get in any programming language is generated through the operating system. 
    In Python, you can use the os module.

    source: https://dev59.com/0rXna4cB1Zd3GeqPNpr0#57416967
    """
    RAND_SIZE = 4
    random_data = os.urandom(
        RAND_SIZE
    )  # Return a string of size random bytes suitable for cryptographic use.
    random_seed = int.from_bytes(random_data, byteorder="big")
    return random_seed