使用Tensorflow 2的Keras函数式API时传递`training=true`的含义

Question

使用Tensorflow 2的Keras函数式API时传递`training=true`的含义

pythontensorflowkerastensorflow2.0tf.keras

10

在TF1中以图形模式运行时，我认为我需要通过feeddicts连接training=True和training=False，当我使用函数式API时。在TF2中，正确的做法是什么？

我认为在使用tf.keras.Sequential时，这会自动处理。例如，我不需要在以下示例中指定training，来自docs：

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Model is the full model w/o custom layers
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(train_data, epochs=NUM_EPOCHS)
loss, acc = model.evaluate(test_data)
print("Loss {:0.4f}, Accuracy {:0.4f}".format(loss, acc))

我可以假设使用函数式API进行训练时，Keras会自动处理这个问题吗？下面是使用函数式API重写的相同模型：

inputs = tf.keras.Input(shape=((28,28,1)), name="input_image")
hid = tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1))(inputs)
hid = tf.keras.layers.MaxPooling2D()(hid)
hid = tf.keras.layers.Flatten()(hid)
hid = tf.keras.layers.Dropout(0.1)(hid)
hid = tf.keras.layers.Dense(64, activation='relu')(hid)
hid = tf.keras.layers.BatchNormalization()(hid)
outputs = tf.keras.layers.Dense(10, activation='softmax')(hid)
model_fn = tf.keras.Model(inputs=inputs, outputs=outputs)

# Model is the full model w/o custom layers
model_fn.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model_fn.fit(train_data, epochs=NUM_EPOCHS)
loss, acc = model_fn.evaluate(test_data)
print("Loss {:0.4f}, Accuracy {:0.4f}".format(loss, acc))

我不确定 hid = tf.keras.layers.BatchNormalization()(hid) 是否需要变成 hid = tf.keras.layers.BatchNormalization()(hid, training)？

这些模型的 colab 可以在这里找到。

- cosentiyes

你是否有特定的原因想要控制训练标志，还是在询问是否需要它？ - Dr. Snoopy

我想我希望能够在model_fn()（tf.keras.Model#call）的前向传递中设置它，以便BatchNormalization行为正确。我认为我需要子类化模型并显式定义前向传递调用，以便我可以将“training”传递给BN调用，类似于https://www.tensorflow.org/api_docs/python/tf/keras/Model中的示例。我还想知道在使用model_fn.fit()时是否需要。 - cosentiyes

@cosentiyes: 你提到 我相信使用 tf.keras.Sequential 时这是自动处理的。你确定吗？你有任何可以证明这一点的参考资料吗？ - Nerxis

2个回答

6

至于更广泛的问题是否在使用Keras函数式API时必须手动传递training标志，来自官方文档的这个示例表明你不应该这样做。

# ...

x = Dropout(0.5)(x)
outputs = Linear(10)(x)
model = tf.keras.Model(inputs, outputs)

# ...

# You can pass a `training` argument in `__call__`
# (it will get passed down to the Dropout layer).
y = model(tf.ones((2, 16)), training=True)

- Ben Usman

1

我在搜索类似的示例时来到了这里。之后，我意识到我可以自己进行简单的测试：https://colab.research.google.com/gist/jjclavijo/f216cb335fdd206bf68238553f9658b0/scratchpad.ipynb 。也许这是答案的一个很好的补充。 - Javier JC

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- cosentiyes · Accepted Answer

我发现在BatchNormalization文档[1]中存在一个错误，其中{{TRAINABLE_ATTRIBUTE_NOTE}}没有被替换为预期的注释[2]。

关于在BatchNormalization层上设置layer.trainable=False: 将layer.trainable=False的含义是冻结该层，即其内部状态在训练过程中不会改变：其可训练权重将不会在fit()或train_on_batch()期间更新，也不会运行其状态更新。通常，这并不意味着该层以推理模式运行（通常由调用层时可以传递的training参数来控制）。"Frozen state"和"inference mode"是两个不同的概念。

然而，在BatchNormalization层的情况下，在该层上设置trainable=False意味着该层随后将以推理模式运行（这意味着它将使用移动平均值和移动方差来归一化当前批次，而不是使用当前批次的平均值和方差）。这种行为已经在TensorFlow 2.0中引入，以便使layer.trainable = False在卷积神经网络微调用例中产生最常见的预期行为。请注意：

此行为仅在TensorFlow 2.0及以上发生。在1.*中，设置layer.trainable=False会冻结该层，但不会将其切换到推理模式。
在包含其他图层的模型上设置trainable将递归地设置所有内部图层的trainable值。
如果在对模型调用compile()之后更改trainable属性的值，则除非再次调用compile()，否则该新值不会对该模型生效。

[1] https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization?version=stable [2] https://github.com/tensorflow/tensorflow/blob/r2.0/tensorflow/python/keras/layers/normalization_v2.py#L26-L65