如何在TensorBoard中分离我的TensorFlow代码运行记录?

17

我的TensorBoard图表将我连续运行的TensorFlow代码视为同一次运行。例如,如果我首先使用FLAGS.epochs == 10运行下面的代码,然后再使用FLAGS.epochs == 40重新运行它,我会得到:

enter image description here

这个图表在第一次运行结束时“回到”第二次运行的开头。

有没有一种方法可以将我的多次运行视为不同的日志,以便可以进行比较或单独查看?


from __future__ import (absolute_import, print_function, division, unicode_literals)

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Basic model parameters as external flags.
flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_float('epochs', 40, 'Epochs to run')
flags.DEFINE_integer('mb_size', 40, 'Mini-batch size. Must divide evenly into the dataset sizes.')
lags.DEFINE_float('learning_rate', 0.15, 'Initial learning rate.')
flags.DEFINE_float('regularization_weight', 0.1 / 1000, 'Regularization lambda.')
flags.DEFINE_string('data_dir', './data', 'Directory to hold training and test data.')
flags.DEFINE_string('train_dir', './_tmp/train', 'Directory to log training (and the network def).')
flags.DEFINE_string('test_dir', './_tmp/test', 'Directory to log testing.')

def variable_summaries(var, name):
    with tf.name_scope("summaries"):
        mean = tf.reduce_mean(var)
        tf.scalar_summary('mean/' + name, mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_sum(tf.square(var - mean)))
            tf.scalar_summary('sttdev/' + name, stddev)
    tf.scalar_summary('max/' + name, tf.reduce_max(var))
    tf.scalar_summary('min/' + name, tf.reduce_min(var))
    tf.histogram_summary(name, var)

def nn_layer(input_tensor, input_dim, output_dim, neuron_fn, layer_name):
    with tf.name_scope(layer_name):
        # This Variable will hold the state of the weights for the layer
        with tf.name_scope("weights"):
            weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=0.1))
            variable_summaries(weights, layer_name + '/weights')
        with tf.name_scope("biases"):
            biases = tf.Variable(tf.constant(0.1, shape=[output_dim]))
            variable_summaries(biases, layer_name + '/biases')
        with tf.name_scope('activations'):
            with tf.name_scope('weighted_inputs'):
                weighted_inputs = tf.matmul(input_tensor, weights) + biases
                tf.histogram_summary(layer_name + '/weighted_inputs', weighted_inputs)
            output = neuron_fn(weighted_inputs)
            tf.histogram_summary(layer_name + '/output', output)
    return output, weights 

# Collect data
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)

# Inputs and outputs
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

# Network structure
o1, W1 = nn_layer(x, 784, 30, tf.nn.sigmoid, 'hidden_layer')
y, W2 = nn_layer(o1, 30, 10, tf.nn.softmax, 'output_layer')

with tf.name_scope('accuracy'):
    with tf.name_scope('loss'):
        cost = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
        loss = cost + FLAGS.regularization_weight * (tf.nn.l2_loss(W1) + tf.nn.l2_loss(W2))
    with tf.name_scope('correct_prediction'):
        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    with tf.name_scope('accuracy'):
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.scalar_summary('accuracy', accuracy)
    tf.scalar_summary('loss', loss)

train_step = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(loss)

# Logging
train_writer = tf.train.SummaryWriter(FLAGS.train_dir, tf.get_default_graph())
test_writer = tf.train.SummaryWriter(FLAGS.test_dir)
merged = tf.merge_all_summaries()

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    for ep in range(FLAGS.epochs):
        for mb in range(int(len(mnist.train.images)/FLAGS.mb_size)):
            batch_xs, batch_ys = mnist.train.next_batch(FLAGS.mb_size)
            sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

        summary = sess.run(merged, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
        test_writer.add_summary(summary, ep+1)

3
将它们放入不同的子目录中,然后它们将显示为单独的运行。 - etarion
@etarion:是的,除了那个显而易见的方法。所以:同一目录意味着根据定义相同的运行,无论代码是否实际在不同时间运行,或者是否生成了不同的文件(自动)。或者换句话说(这真的是问题):没有办法在目录中区分单独的日志文件? - orome
1
当您保存/恢复运行时,您不想区分单独的日志文件...可能有一个选项,但如果有的话,我不知道。 - etarion
@etarion:我确实想要这样做,而且应该有一种方法可以实现。 - orome
3个回答

13

你可以将你的运行放入单独的子目录中,例如:

./logdir/2016111301/
./logdir/2016111302/
./logdir/2016111401/

然后您在根目录上调用TensorBoard函数:

tensorboard --logdir=logdir

那么您将拥有单独的日志文件,例如:

在这里输入图片描述


1
这不是问题所在。问题是如何做到这一点(已经有答案了)。 - orome
这就是为什么我写道“将您的运行放入单独的子目录中”。或者我误解了问题?如果您已经得到了问题的答案,为什么还没有接受任何答案呢? - bendaf

6
from fs.osfs import OSFS
folder = OSFS(FLAGS.test_dir)
test_n = len(list(n for n in folder.listdir() if n.startswith('test')))
this_test = FLAGS.test_dir+"/test" + str(test_n+1)
test_writer = tf.train.SummaryWriter(this_test)

您可以使用类似以下代码来对您的运行进行编号:

2
你可以使用(底层)“time”模块获取运行开始的字符串,并相应地命名目录。 以下是使用TensorFlow后端和keras的示例。
from keras.callbacks import TensorBoard
import time

now = time.strftime("%c")
model.fit(X, Y, batch_size = 2, nb_epoch = 100, shuffle = True,
        verbose = 1, validation_split = 0.1, 
        callbacks =[TensorBoard(log_dir='./logs/'+now, histogram_freq=0, write_graph=True)])

这将为您提供一组目录,例如:
% ls logs
Fri Sep  2 23:58:39 2016/ Sat Sep  3 00:05:41 2016/

是的,名称中有空格,但Tensorboard不介意。您将看到一系列运行,根据开始时间进行彩色编码和日期时间戳。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接