如何在tensorflow中正确使用批量归一化?

19

我曾尝试过在tensorflow中使用几个版本的Batch Normalization,但它们都没有起作用!当我在推理时将batch_size设置为1时,结果都是错误的。

版本1:直接使用tensorflow.contrib中的官方版本

from tensorflow.contrib.layers.python.layers.layers import batch_norm

使用方法如下:

output = lrelu(batch_norm(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name)

在训练时is_training = True,在推断时为False。

版本2:来自如何在TensorFlow中使用批量规范化?

def batch_norm_layer(x, train_phase, scope_bn='bn'):
    bn_train = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True,
            updates_collections=None,
            is_training=True,
            reuse=None, # is this right?
            trainable=True,
            scope=scope_bn)
    bn_inference = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True,
            updates_collections=None,
            is_training=False,
            reuse=True, # is this right?
            trainable=True,
            scope=scope_bn)
    z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
    return z

使用方法如下:

output = lrelu(batch_norm_layer(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name)

在训练时,is_training是一个占位符,为True,在推理时为False。

版本3:来自slim https://github.com/tensorflow/models/blob/master/inception/inception/slim/ops.py

def batch_norm_layer(inputs,
           is_training=True,
           scope='bn'):
  decay=0.999
  epsilon=0.001
  inputs_shape = inputs.get_shape()
  with tf.variable_scope(scope) as t_scope:
    axis = list(range(len(inputs_shape) - 1))
    params_shape = inputs_shape[-1:]
    # Allocate parameters for the beta and gamma of the normalization.
    beta, gamma = None, None
    beta = tf.Variable(tf.zeros_initializer(params_shape),
        name='beta',
        trainable=True)
    gamma = tf.Variable(tf.ones_initializer(params_shape),
        name='gamma',
        trainable=True)
    moving_mean = tf.Variable(tf.zeros_initializer(params_shape),
        name='moving_mean',
        trainable=False)
    moving_variance = tf.Variable(tf.ones_initializer(params_shape),
        name='moving_variance',
        trainable=False)
    if is_training:
      # Calculate the moments based on the individual batch.
      mean, variance = tf.nn.moments(inputs, axis)

      update_moving_mean = moving_averages.assign_moving_average(
          moving_mean, mean, decay)
      update_moving_variance = moving_averages.assign_moving_average(
          moving_variance, variance, decay)
    else:
      # Just use the moving_mean and moving_variance.
      mean = moving_mean
      variance = moving_variance
      # Normalize the activations.
    outputs = tf.nn.batch_normalization(
       inputs, mean, variance, beta, gamma, epsilon)
    outputs.set_shape(inputs.get_shape())
    return outputs

使用如下方式:

output = lrelu(batch_norm_layer(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name)

在训练时is_training为True,在推理时为False。

版本4:与版本3类似,但添加了tf.control_dependencies。

def batch_norm_layer(inputs,
           decay=0.999,
           center=True,
           scale=True,
           epsilon=0.001,
           moving_vars='moving_vars',
           activation=None,
           is_training=True,
           trainable=True,
           restore=True,
           scope='bn',
           reuse=None):
  inputs_shape = inputs.get_shape()
  with tf.variable_op_scope([inputs], scope, 'BatchNorm', reuse=reuse):
      axis = list(range(len(inputs_shape) - 1))
      params_shape = inputs_shape[-1:]
      # Allocate parameters for the beta and gamma of the normalization.
      beta = tf.Variable(tf.zeros(params_shape), name='beta')
      gamma = tf.Variable(tf.ones(params_shape), name='gamma')
      # Create moving_mean and moving_variance add them to
      # GraphKeys.MOVING_AVERAGE_VARIABLES collections.
      moving_mean = tf.Variable(tf.zeros(params_shape), name='moving_mean',
            trainable=False)
      moving_variance = tf.Variable(tf.ones(params_shape),   name='moving_variance', 
            trainable=False)
  control_inputs = []
  if is_training:
      # Calculate the moments based on the individual batch.
      mean, variance = tf.nn.moments(inputs, axis)

      update_moving_mean = moving_averages.assign_moving_average(
          moving_mean, mean, decay)
      update_moving_variance = moving_averages.assign_moving_average(
          moving_variance, variance, decay)
      control_inputs = [update_moving_mean, update_moving_variance]
  else:
      # Just use the moving_mean and moving_variance.
      mean = moving_mean
      variance = moving_variance
  # Normalize the activations. 
  with tf.control_dependencies(control_inputs):
      return tf.nn.batch_normalization(
        inputs, mean, variance, beta, gamma, epsilon)

使用方法如下:

output = lrelu(batch_norm(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name)

在训练时is_training = True,在推理时为False。

四个版本的Batch_normalization都不正确。那么,如何正确使用批归一化?

另一个奇怪的现象是,如果我将batch_norm_layer设置为空,推理结果都相同。

def batch_norm_layer(inputs, is_training):
    return inputs

1
我坚信了解所使用的基本概念非常重要。我建议您阅读有关批量归一化的论文,以真正理解它为什么以及如何帮助:https://arxiv.org/pdf/1502.03167.pdf - Thomas Pinetz
如果你说“都不正确”,你的意思是什么? - etarion
1
这意味着“它们全部都是错误的”。 - widgetxp
2个回答

8

我已经测试过以下简化的批量归一化实现,只要设置相同,它就能够产生与tf.contrib.layers.batch_norm相同的结果。

def initialize_batch_norm(scope, depth):
    with tf.variable_scope(scope) as bnscope:
         gamma = tf.get_variable("gamma", shape[-1], initializer=tf.constant_initializer(1.0))
         beta = tf.get_variable("beta", shape[-1], initializer=tf.constant_initializer(0.0))
         moving_avg = tf.get_variable("moving_avg", shape[-1], initializer=tf.constant_initializer(0.0), trainable=False)
         moving_var = tf.get_variable("moving_var", shape[-1], initializer=tf.constant_initializer(1.0), trainable=False)
         bnscope.reuse_variables()


def BatchNorm_layer(x, scope, train, epsilon=0.001, decay=.99):
    # Perform a batch normalization after a conv layer or a fc layer
    # gamma: a scale factor
    # beta: an offset
    # epsilon: the variance epsilon - a small float number to avoid dividing by 0
    with tf.variable_scope(scope, reuse=True):
        with tf.variable_scope('BatchNorm', reuse=True) as bnscope:
            gamma, beta = tf.get_variable("gamma"), tf.get_variable("beta")
            moving_avg, moving_var = tf.get_variable("moving_avg"), tf.get_variable("moving_var")
            shape = x.get_shape().as_list()
            control_inputs = []
            if train:
                avg, var = tf.nn.moments(x, range(len(shape)-1))
                update_moving_avg = moving_averages.assign_moving_average(moving_avg, avg, decay)
                update_moving_var = moving_averages.assign_moving_average(moving_var, var, decay)
                control_inputs = [update_moving_avg, update_moving_var]
            else:
                avg = moving_avg
                var = moving_var
            with tf.control_dependencies(control_inputs):
                output = tf.nn.batch_normalization(x, avg, var, offset=beta, scale=gamma, variance_epsilon=epsilon)
    return output

使用tf.contrib.layers.batch_norm官方实现的主要提示是:(1)在训练时设置is_training=True,在验证和测试时设置is_training=False;(2)设置updates_collections=None以确保moving_variancemoving_mean被就地更新;(3)注意并小心作用域设置;(4)如果您的数据集较小或总训练次数/步骤不太大,则将decay设置为较小的值(decay=0.9decay=0.99),而不是默认值(默认值为0.999)。

谢谢你,Zhongyu Kuang。除了第四项,我得出了与你相同的结论。 - widgetxp
3
我一直与 tf.contrib.layers.batch_norm 存在问题。训练时我的神经网络会收敛,但在测试网络且设置 is_training=False 时,我得到的结果就毫无意义了。然而,当 is_training=True 时,测试结果对我来说更有意义(即使与没有使用 batch_norm 的网络相比准确率几乎为零)。你有什么想法吗?我在这里提问:[https://dev59.com/45_ha4cB1Zd3GeqP7vVh](Tensorflow batch_norm does not work properly when testing (is_training=False)) - user3157047
1
@Zhongyu Kuang,你能详细解释一下 updates_collections 吗?如果我们使用 tf.GraphKeys.UPDATE_OPS 来更新它们,会发生什么?以及如何在推断中使用它们。 - Shamane Siriwardhana
1
嗨,关于 updates_collections=None 怎么样?你能帮我理解一下吗?我查看了代码,如果我们使用更新选项会怎样呢? - Shamane Siriwardhana

2
我发现中宇狂的代码非常有用,但是我卡在如何动态地在训练和测试操作之间切换上,即如何将Python布尔值is_training转换为TensorFlow布尔占位符is_training。我需要这个功能来在训练期间在验证集上测试网络。
从他的代码开始,并受到this的启发,我编写了以下代码:
def batch_norm(x, scope, is_training, epsilon=0.001, decay=0.99):
    """
    Returns a batch normalization layer that automatically switch between train and test phases based on the 
    tensor is_training

    Args:
        x: input tensor
        scope: scope name
        is_training: boolean tensor or variable
        epsilon: epsilon parameter - see batch_norm_layer
        decay: epsilon parameter - see batch_norm_layer

    Returns:
        The correct batch normalization layer based on the value of is_training
    """
    assert isinstance(is_training, (ops.Tensor, variables.Variable)) and is_training.dtype == tf.bool

    return tf.cond(
        is_training,
        lambda: batch_norm_layer(x=x, scope=scope, epsilon=epsilon, decay=decay, is_training=True, reuse=None),
        lambda: batch_norm_layer(x=x, scope=scope, epsilon=epsilon, decay=decay, is_training=False, reuse=True),
    )


def batch_norm_layer(x, scope, is_training, epsilon=0.001, decay=0.99, reuse=None):
    """
    Performs a batch normalization layer

    Args:
        x: input tensor
        scope: scope name
        is_training: python boolean value
        epsilon: the variance epsilon - a small float number to avoid dividing by 0
        decay: the moving average decay

    Returns:
        The ops of a batch normalization layer
    """
    with tf.variable_scope(scope, reuse=reuse):
        shape = x.get_shape().as_list()
        # gamma: a trainable scale factor
        gamma = tf.get_variable("gamma", shape[-1], initializer=tf.constant_initializer(1.0), trainable=True)
        # beta: a trainable shift value
        beta = tf.get_variable("beta", shape[-1], initializer=tf.constant_initializer(0.0), trainable=True)
        moving_avg = tf.get_variable("moving_avg", shape[-1], initializer=tf.constant_initializer(0.0), trainable=False)
        moving_var = tf.get_variable("moving_var", shape[-1], initializer=tf.constant_initializer(1.0), trainable=False)
        if is_training:
            # tf.nn.moments == Calculate the mean and the variance of the tensor x
            avg, var = tf.nn.moments(x, range(len(shape)-1))
            update_moving_avg = moving_averages.assign_moving_average(moving_avg, avg, decay)
            update_moving_var = moving_averages.assign_moving_average(moving_var, var, decay)
            control_inputs = [update_moving_avg, update_moving_var]
        else:
            avg = moving_avg
            var = moving_var
            control_inputs = []
        with tf.control_dependencies(control_inputs):
            output = tf.nn.batch_normalization(x, avg, var, offset=beta, scale=gamma, variance_epsilon=epsilon)

    return output

然后我这样使用批量归一化层:
fc1_weights = tf.Variable(...)
fc1 = tf.matmul(x, fc1_weights)
fc1 = batch_norm(fc1, 'fc1_bn', is_training=is_training)
fc1 = tf.nn.relu(fc1)

“where is_training是一个布尔占位符。请注意,由于被替换为Beta参数,因此不需要添加偏置项,详见批归一化论文

执行期间:


# Training phase
sess.run(loss, feed_dict={x: bx, y: by, is_training: True})

# Testing phase
sess.run(loss, feed_dict={x: bx, y: by, is_training: False})

请注意,您可以使用tf.contrib.layers.batch_norm() - 它接受的is_training参数可以是布尔占位符! - ZeDuS
实际上,这个问题是关于在tensorflow中实现批量归一化的另一种方式,所以我提供了我编写的代码来实现它,而不使用contrib模块中的任何函数。正式地说,“contrib模块包含不稳定或实验性的代码”,因此在某些情况下避免使用它可能很有用。 - Stefano P.

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接