当测试时（is_training=False），Tensorflow的batch_norm无法正常工作。

Question

当测试时（is_training=False），Tensorflow的batch_norm无法正常工作。

8

我将训练以下模型：

with slim.arg_scope(inception_arg_scope(is_training=True)):
    logits_v, endpoints_v = inception_v3(all_v, num_classes=25, is_training=True, dropout_keep_prob=0.8,
                     spatial_squeeze=True, reuse=reuse_variables, scope='vis')
    logits_p, endpoints_p = inception_v3(all_p, num_classes=25, is_training=True, dropout_keep_prob=0.8,
                     spatial_squeeze=True, reuse=reuse_variables, scope='pol')
    pol_features = endpoints_p['pol/features']
    vis_features = endpoints_v['vis/features']

eps = 1e-08
loss = tf.sqrt(tf.maximum(tf.reduce_sum(tf.square(pol_features - vis_features), axis=1, keep_dims=True), eps))

# rest of code
saver = tf.train.Saver(tf.global_variables())

where

def inception_arg_scope(weight_decay=0.00004,
                    batch_norm_decay=0.9997,
                    batch_norm_epsilon=0.001, is_training=True):
normalizer_params = {
    'decay': batch_norm_decay,
    'epsilon': batch_norm_epsilon,
    'is_training': is_training
}
normalizer_fn = tf.contrib.layers.batch_norm

# Set weight_decay for weights in Conv and FC layers.
with slim.arg_scope([slim.conv2d, slim.fully_connected],
                    weights_regularizer=slim.l2_regularizer(weight_decay)):
    with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training):
        with slim.arg_scope(
                [slim.conv2d],
                weights_initializer=slim.variance_scaling_initializer(),
                activation_fn=tf.nn.relu,
                normalizer_fn=normalizer_fn,
                normalizer_params=normalizer_params) as sc:
            return sc

inception_V3定义在这里。我的模型训练非常好，损失从60减少到不到1。但是当我想在另一个文件中测试该模型时：

with slim.arg_scope(inception_arg_scope(is_training=False)):
    logits_v, endpoints_v = inception_v3(all_v, num_classes=25, is_training=False, dropout_keep_prob=0.8,
                     spatial_squeeze=True, reuse=reuse_variables, scope='vis')
    logits_p, endpoints_p = inception_v3(all_p, num_classes=25, is_training=False, dropout_keep_prob=0.8,
                     spatial_squeeze=True, reuse=reuse_variables, scope='pol')

我的模型的结果毫无意义，或者更准确地说，对于所有的训练和测试样本，损失值都是1e-8。当我将is_training=True时，它能够给出更合理的结果，但仍然比训练阶段的损失值大（即使在使用训练数据进行测试时也是如此）。我使用VGG16也遇到了同样的问题。在不使用批量归一化的情况下，我的测试准确度达到了100%，而使用批量归一化时为0%。

这里我错过了什么？谢谢。

- user3157047

你的批量归一化层中decay的设置是多少？如果是decay=0.999，尝试将其增加到decay=0.99或decay=0.9，看看是否可以解决你的问题。 - Zhongyu Kuang

我的错误是在应用“apply_gradient_op”时，我漏掉了“batchnorm_updates_op”作为依赖项。之后，我将衰减减小到了“0.9”，但在测试时间中却不起作用（巨大的损失）。然后我选择了“0.99”，它就有效了。无论如何，还是非常感谢你。 - user3157047

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user9067885 · Accepted Answer

我遇到了同样的问题，并解决了。当你使用slim.batch_norm时，请确保使用slim.learning.create_train_op而不是tf.train.GradientDecentOptimizer(lr).minimize(loss)或其他优化器。尝试一下看看是否有效！