我有一个类似于这个的问题。
由于资源有限,我使用深度模型(VGG-16)来训练三元组网络。我想要在大小为一的训练样例中积累 128 批次的梯度,然后传播误差并更新权重。
不清楚如何实现。我使用 tensorflow,但是任何实现/伪代码都可以。
我有一个类似于这个的问题。
由于资源有限,我使用深度模型(VGG-16)来训练三元组网络。我想要在大小为一的训练样例中积累 128 批次的梯度,然后传播误差并更新权重。
不清楚如何实现。我使用 tensorflow,但是任何实现/伪代码都可以。
让我们一起来看看你链接中一个答案中提出的代码:
## Optimizer definition - nothing different from any classical example
opt = tf.train.AdamOptimizer()
## Retrieve all trainable variables you defined in your graph
tvs = tf.trainable_variables()
## Creation of a list of variables with the same shape as the trainable ones
# initialized with 0s
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
## Calls the compute_gradients function of the optimizer to obtain... the list of gradients
gvs = opt.compute_gradients(rmse, tvs)
## Adds to each element from the list you initialized earlier with zeros its gradient (works because accum_vars and gvs are in the same order)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
## Define the training step (part with variable value update)
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])
这段内容主要是向您的图形添加新的变量
和运算符
,以便您可以:
accum_vars
中使用运算符accum_ops
累积梯度train_step
更新模型权重然后,在训练时使用它,您需要按照下面的步骤进行(仍然来自您链接的答案):
## The while loop for training
while ...:
# Run the zero_ops to initialize it
sess.run(zero_ops)
# Accumulate the gradients 'n_minibatches' times in accum_vars using accum_ops
for i in xrange(n_minibatches):
sess.run(accum_ops, feed_dict=dict(X: Xs[i], y: ys[i]))
# Run the train_step ops to update the weights based on your accumulated gradients
sess.run(train_step)
sess.run(train_step)
放在了循环外面。这意味着权重更新会在计算完最后一个批次的梯度之后发生,对吗?如果我们将其放在循环内部,那么它会在每个时代之后发生,对吗? - ARATTensorFlow 2.0兼容的解答:与上面提到的Pop答案和TensorFlow网站中提供的解释一致,以下是TensorFlow 2.0版本中累积梯度的代码:
def train(epochs):
for epoch in range(epochs):
for (batch, (images, labels)) in enumerate(dataset):
with tf.GradientTape() as tape:
logits = mnist_model(images, training=True)
tvs = mnist_model.trainable_variables
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
loss_value = loss_object(labels, logits)
loss_history.append(loss_value.numpy().mean())
grads = tape.gradient(loss_value, tvs)
#print(grads[0].shape)
#print(accum_vars[0].shape)
accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]
optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
print ('Epoch {} finished'.format(epoch))
# call the above function
train(epochs = 3)
完整的代码可以在这个Github Gist中找到。
optimizer.apply_gradients(zip(accum_ops, mnist_model.trainable_variables))
吗? - Jared Nielsen