如何在Tensorflow中使用指数移动平均?

7

问题

Tensorflow 包括函数 tf.train.ExponentialMovingAverage ,它允许我们对参数应用移动平均值,这对于稳定模型测试非常有帮助。

但是,我发现将其应用到一般模型上有些困难。迄今为止,我最成功的方法(如下所示)是编写一个函数装饰器,然后将整个 NN 放入一个函数中。

然而,这样做有几个缺点。首先,会复制整个图形,其次,需要在一个函数内定义 NN。

有更好的方法吗?

当前实现

def ema_wrapper(is_training, decay=0.99):
    """Use Exponential Moving Average of parameters during testing.

    Parameters
    ----------
    is_training : bool or `tf.Tensor` of type bool
        EMA is applied if ``is_training`` is False.
    decay:
        Decay rate for `tf.train.ExponentialMovingAverage`
    """
    def function(fun):
        @functools.wraps(fun)
        def fun_wrapper(*args, **kwargs):
            # Regular call
            with tf.variable_scope('ema_wrapper', reuse=False) as scope:
                result_train = fun(*args, **kwargs)

            # Set up exponential moving average
            ema = tf.train.ExponentialMovingAverage(decay=decay)
            var_class = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
                                          scope.name)
            ema_op = ema.apply(var_class)

            # Add to collection so they are updated
            tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, ema_op)

            # Getter for the variables with EMA applied
            def ema_getter(getter, name, *args, **kwargs):
                var = getter(name, *args, **kwargs)
                ema_var = ema.average(var)
                return ema_var if ema_var else var

            # Call with EMA applied
            with tf.variable_scope('ema_wrapper', reuse=True,
                                   custom_getter=ema_getter):
                result_test = fun(*args, **kwargs)

            # Return the correct version depending on if we're training or not
            return tf.cond(is_training,
                           lambda: result_train, lambda: result_test)
        return fun_wrapper
    return function

示例用法:

@ema_wrapper(is_training)
def neural_network(x):
    # If is_training is False, we will use an EMA of a instead
    a = tf.get_variable('a', [], tf.float32)
    return a * x

不确定您是否认为这是一个有效的解决方案,但您可以有一个操作将EMA值复制到原始变量,并在训练完成后运行它。 - jdehesa
当然,这听起来是有效的。有没有一种标准化的方法? - Jonas Adler
1个回答

15

您可以使用操作将EMA变量中的值传输到原始变量中:

import tensorflow as tf

# Make model...
minimize_op = ...
model_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
# Make EMA object and update interal variables after optimization step
ema = tf.train.ExponentialMovingAverage(decay=decay)
with tf.control_dependencies([minimize_op]):
    train_op = ema.apply(model_vars)

# Transfer EMA values to original variables
retrieve_ema_weights_op = tf.group(
    [tf.assign(var, ema.average(var)) for var in model_vars])

with tf.Session() as sess:
    # Do training
    while ...:
        sess.run(train_op, ...)
    # Copy EMA values to weights
    sess.run(retrieve_ema_weights_op)
    # Test model with EMA weights
    # ...

编辑:

我做了一个更长的版本,具有在变量备份下在训练和测试模式之间切换的功能:

import tensorflow as tf

# Make model...
minimize_op = ...
model_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)

is_training = tf.get_variable('is_training', shape=(), dtype=tf.bool,
                              initializer=tf.constant_initializer(True, dtype=tf.bool))

# Make EMA object and update internal variables after optimization step
ema = tf.train.ExponentialMovingAverage(decay=decay)
with tf.control_dependencies([minimize_op]):
    train_op = ema.apply(model_vars)
# Make backup variables
with tf.variable_scope('BackupVariables'):
    backup_vars = [tf.get_variable(var.op.name, dtype=var.value().dtype, trainable=False,
                                   initializer=var.initialized_value())
                   for var in model_vars]

def ema_to_weights():
    return tf.group(*(tf.assign(var, ema.average(var).read_value())
                     for var in model_vars))
def save_weight_backups():
    return tf.group(*(tf.assign(bck, var.read_value())
                     for var, bck in zip(model_vars, backup_vars)))
def restore_weight_backups():
    return tf.group(*(tf.assign(var, bck.read_value())
                     for var, bck in zip(model_vars, backup_vars)))

def to_training():
    with tf.control_dependencies([tf.assign(is_training, True)]):
        return restore_weight_backups()

def to_testing():
    with tf.control_dependencies([tf.assign(is_training, False)]):
        with tf.control_dependencies([save_weight_backups()]):
            return ema_to_weights()

switch_to_train_mode_op = tf.cond(is_training, lambda: tf.group(), to_training)
switch_to_test_mode_op = tf.cond(is_training, to_testing, lambda: tf.group())

init_op = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init_op)
    # Unnecessary, since it begins in training mode, but unharmful
    sess.run(switch_to_train_mode_op)
    # Do training
    while ...:
        sess.run(train_op, ...)
    # To test mode
    sess.run(switch_to_test_mode_op)
    # Switching multiple times should not overwrite backups
    sess.run(switch_to_test_mode_op)
    # Test model with EMA weights
    # ...
    # Back to training mode
    sess.run(switch_to_train_mode_op)
    # Keep training...

我该如何将模型“重置”为非EMA权重? - Jonas Adler
2
@JonasAdler 我可以想到两种方法:1)在TensorFlow中,创建另一组影子变量进行备份。2)在TensorFlow之外,读取变量值并将其存储在Python(NumPy)对象中,然后使用tf.assign操作或变量的load方法将它们放回去。如果您需要任何帮助,我可以扩展答案。 - jdehesa
@JonasAdler 我已经更新了答案,并提供了选项2)的“草图”。 - jdehesa
@jdehesa 很好的回答。 - Maruf

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接