Tensorflow 多 GPU - NCCL

Question

Tensorflow 多 GPU - NCCL

3

我一直想增加批处理大小来提高模型的泛化能力（非常依赖批处理大小）。解决方法是通过多GPU来利用更多的内存。我在我的脚本中使用tensorflow.keras（在Windows 10上使用tensorflow 2.1），并按照配置镜像策略的说明对我的模型进行操作。问题是，我的训练脚本在没有镜像策略代码的情况下可以完美运行，但是使用了镜像策略后，我会遇到关于NCCL的错误。这似乎是与以下相同的问题：

https://github.com/tensorflow/tensorflow/issues/21470

不幸的是，那个链接中讨论的解决方案：

cross_tower_ops = tf.contrib.distribute.AllReduceCrossDeviceOps(
'hierarchical_copy', num_packs=num_gpus))
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)

因为tf的'contrib'部分似乎已被删除，所以它与tf 2.1不兼容。有人知道NCCL在Windows上的替代解决方案，或者用于替换消失的'tf contrib'部分的内容吗？

- Jarrod Christman

我实际上没有在Windows上做过。能够在Linux上使其工作...但我仍然希望在Windows上也有一种方法。我在这个问题上开始了一个赏金，希望能引起一些对这个问题的关注。 - Jarrod Christman

2个回答

0

根据我的经验，一些 cross_device_ops 可能无法正常工作并产生错误。

此选项适用于 NVIDIA DGX-1 架构，可能在其他架构上表现不佳：

strategy = tf.distribute.MirroredStrategy(
    cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())

应该可以工作：

strategy = tf.distribute.MirroredStrategy(
     cross_device_ops=tf.distribute.ReductionToOneDevice())

不适用于我的配置：

strategy = tf.distribute.MirroredStrategy(
     cross_device_ops=tf.distribute.NcclAllReduce())

这样就可以建议尝试不同的选项。

- kiriloff

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- kerasbaz · Accepted Answer

在问题21470中的一个解决方案是为Winx64构建nccl。MyCaffe在此提供了说明：https://github.com/MyCaffe/NCCL/blob/master/INSTALL.md

您需要VS 2015、2017、CUDA开发包，并在编译后将生成的.dll文件放置在正确的位置。