如何在tensorflow 2.0中累积梯度？

Question

如何在tensorflow 2.0中累积梯度？

8

我正在使用 tensorflow 2.0 训练一个模型。我的训练集中的图像具有不同的分辨率。我建立的模型可以处理可变分辨率（卷积层后跟全局均值池化）。我的训练集非常小，我想在一个批次中使用全部训练集。

由于我的图像分辨率不同，所以无法使用 model.fit()。因此，我计划逐个将每个样本通过网络，累积误差/梯度，然后应用一个优化器步骤。我能够计算损失值，但我不知道如何累积损失/梯度。如何累积损失/梯度，然后应用单个优化器步骤？

代码:

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0
    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        gradients = tape.gradient(loss_value, self.model.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
        total_loss += loss_value

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

- Nagabhushan S N

tf.Keras.fit()？你是不是想用 tf.keras.Model 中的 model.fit() 方法？ - GPhilo

是的。你说得对。 - Nagabhushan S N

请查看 https://www.tensorflow.org/tutorials/customization/autodiff 和此指南中 train_step 的实现。 - GPhilo

谢谢。但我不知道如何累积梯度。 - Nagabhushan S N

2个回答

3

根据 Stack Overflow Answer 和 Tensorflow 官网的解释，下面是 Tensorflow 2.0 版本中用于累积梯度的代码:

def train(epochs):
  for epoch in range(epochs):
    for (batch, (images, labels)) in enumerate(dataset):
       with tf.GradientTape() as tape:
        logits = mnist_model(images, training=True)
        tvs = mnist_model.trainable_variables
        accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
        zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
        loss_value = loss_object(labels, logits)

       loss_history.append(loss_value.numpy().mean())
       grads = tape.gradient(loss_value, tvs)
       #print(grads[0].shape)
       #print(accum_vars[0].shape)
       accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]



    optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
    print ('Epoch {} finished'.format(epoch))

# Call the above function    
train(epochs = 3)

完整的代码可以在这个Github Gist中找到。

- user11530462

1

accum_vars 应该被传递给 apply_gradients() 函数吗？就像这样：optimizer.apply_gradients(zip(accum_vars, mnist_model.trainable_variables))。据我所知，accum_vars[i].assign_add(grad) 将 grad 添加到 accum_vars[i] 中。因此，在最后，accum_vars 已经累积了梯度，而 grads 只有最后一批的梯度。 - Nagabhushan S N

@NagabhushanSN，我认为他们正在正常训练模型，但只是为了模型分析而进行累积。如果您想要累积小批量的梯度，您是正确的。您需要将accum_vars移出最后一个for循环。虽然，我不确定在应用梯度之前是否需要将梯度平均在一起。 - targetXING

@FreedomToWin 谢谢。但即使如此，代码似乎也有问题。他们没有应用每个批次的梯度。梯度仅在批次循环完成后应用，这意味着仅应用最后一批次的梯度。另外，您所说的分析是什么类型的？您能否向我推荐一些分析每个时期中每个批次梯度的文章。只是好奇想看看。谢谢！ - Nagabhushan S N

我听说过使用预测相对于输入的平均梯度作为特征重要性。你说的代码有问题是正确的。为什么不直接累积损失进行训练呢？ - targetXING

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ramiro R.C. · Accepted Answer

如果我正确理解这个声明：

如何累积损失/梯度，然后应用单个优化器步骤？

@Nagabhushan试图累积梯度，然后在（平均）累积梯度上应用优化。@TensorflowSupport提供的答案并没有回答这个问题。为了只进行一次优化，并累积多个磁带中的梯度，您可以执行以下操作：

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0

    # get trainable variables
    train_vars = self.model.trainable_variables
    # Create empty gradient list (not a tf.Variable list)
    accum_gradient = [tf.zeros_like(this_var) for this_var in train_vars]

    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        total_loss += loss_value

        # get gradients of this tape
        gradients = tape.gradient(loss_value, train_vars)
        # Accumulate the gradients
        accum_gradient = [(acum_grad+grad) for acum_grad, grad in zip(accum_gradient, gradients)]


    # Now, after executing all the tapes you needed, we apply the optimization step
    # (but first we take the average of the gradients)
    accum_gradient = [this_grad/num_samples for this_grad in accum_gradient]
    # apply optimization step
    self.optimizer.apply_gradients(zip(accum_gradient,train_vars))
        

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

在训练循环内部应避免使用tf.Variable()，因为当尝试将其作为图执行时，会产生错误。如果您在训练函数中使用tf.Variable()，然后再用"@tf.function"进行修饰或者应用"tf.function(my_train_fcn)"来获得图函数（即为了提高性能），则执行会报错。

这是因为tf.Variable函数的跟踪结果与急切执行中观察到的行为不同（分别是重复使用或创建）。您可以在tensorflow帮助页面上找到更多信息。