Tensorflow 模型恢复（继续训练似乎从头开始）

Question

Tensorflow 模型恢复（继续训练似乎从头开始）

4

我在保存模型后恢复训练时遇到了问题。问题是，当我的损失从6下降到3时，我保存了模型。当我恢复并继续训练时，损失重新从6开始。似乎恢复并没有真正起作用。我不明白为什么，因为打印权重时，它们似乎已经正确加载了。我使用ADAM优化器。提前感谢。

    batch_size = self.batch_size 
    num_classes = self.num_classes

    n_hidden = 50 #700 
    n_layers = 1 #3
    truncated_backprop = self.seq_len 
    dropout = 0.3 
    learning_rate = 0.001
    epochs = 200

    with tf.name_scope('input'):
        x = tf.placeholder(tf.float32, [batch_size, truncated_backprop], name='x')
        y = tf.placeholder(tf.int32, [batch_size, truncated_backprop], name='y')

    with tf.name_scope('weights'):
        W = tf.Variable(np.random.rand(n_hidden, num_classes), dtype=tf.float32)
        b = tf.Variable(np.random.rand(1, num_classes), dtype=tf.float32)

    inputs_series = tf.split(x, truncated_backprop, 1)
    labels_series = tf.unstack(y, axis=1)

    with tf.name_scope('LSTM'):
        cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, state_is_tuple=True)
        cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=dropout)
        cell = tf.contrib.rnn.MultiRNNCell([cell] * n_layers)

    states_series, current_state = tf.contrib.rnn.static_rnn(cell, inputs_series, \
        dtype=tf.float32)

    logits_series = [tf.matmul(state, W) + b for state in states_series]
    prediction_series = [tf.nn.softmax(logits) for logits in logits_series]

    losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) \
        for logits, labels, in zip(logits_series, labels_series)]
    total_loss = tf.reduce_mean(losses)

    train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)

    tf.summary.scalar('total_loss', total_loss)
    summary_op = tf.summary.merge_all()

    loss_list = []
    writer = tf.summary.FileWriter('tf_logs', graph=tf.get_default_graph())

    all_saver = tf.train.Saver()

    with tf.Session() as sess:
        #sess.run(tf.global_variables_initializer())
        tf.reset_default_graph()
        saver = tf.train.import_meta_graph('./models/tf_models/rnn_model.meta')
        saver.restore(sess, './models/tf_models/rnn_model')

        for epoch_idx in range(epochs):
            xx, yy = next(self.get_batch)
            batch_count = len(self.D.chars) // batch_size // truncated_backprop

            for batch_idx in range(batch_count):
                batchX, batchY = next(self.get_batch)

                summ, _total_loss, _train_step, _current_state, _prediction_series = sess.run(\
                    [summary_op, total_loss, train_step, current_state, prediction_series],
                    feed_dict = {
                        x : batchX,
                        y : batchY
                    })

                loss_list.append(_total_loss)
                writer.add_summary(summ, epoch_idx * batch_count + batch_idx)
                if batch_idx % 5 == 0:
                    print('Step', batch_idx, 'Batch_loss', _total_loss)

                if batch_idx % 50 == 0:
                    all_saver.save(sess, 'models/tf_models/rnn_model')

            if epoch_idx % 5 == 0:
                print('Epoch', epoch_idx, 'Last_loss', loss_list[-1])

- JimZer

权重已经恢复正常了，但数据呢？它还是一样的吗？ - Dmitriy Danevskiy

@DanevskyiDmytro 我的数据是分批次处理的。批次的检索顺序是随机的，但所有数据集（整个时期）的损失都接近3。因此，我希望当我恢复时，任何批次的损失都会从接近3重新开始？ - JimZer

你能否将数据集限制在几个批次内，并对它们进行训练和测试？ - Dmitriy Danevskiy

2个回答

0

我的问题是标签的代码错误，它们在两次运行之间发生了变化。现在它可以正常工作。谢谢你的帮助

- JimZer

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Robert Kelevra · Accepted Answer

我遇到了同样的问题，在我的情况下，模型被正确恢复，但是损失一次又一次地开始变得非常高，问题在于我的批量检索不是随机的。我有三个类，A、B和C。我的数据是这样被喂养的，先是A，然后是B，最后是C。我不知道这是否是你的问题，但你必须确保你给模型的每个批次都包含了你所有的类，因此，在你的情况下，每个批次必须有batch_size/num_classes个输入，每个类一个。我改变了它，一切都完美了 :)

请检查你是否正确地喂养了你的模型。