多条件的早停策略

Question

多条件的早停策略

pythonpython-3.xtensorflowkerasrecommendation-engine

12

我正在为一个推荐系统（物品推荐）进行多类分类，目前正在使用 sparse_categorical_crossentropy 损失函数训练我的神经网络。因此，通过监控验证损失 val_loss ，执行 EarlyStopping 是合理的。

tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

一切都按预期运作。然而，网络（推荐系统）的性能是通过Average-Precision-at-10来衡量的，在训练过程中作为指标进行跟踪，称为average_precision_at_k10。因此，我也可以使用这个指标进行早停：

tf.keras.callbacks.EarlyStopping(monitor='average_precision_at_k10', patience=10)

也符合预期的工作。

我的问题:有时验证损失会增加，而Top-10平均精度在提高，反之亦然。因此，我需要监测两者并仅在两者都恶化时执行早停止。我想做的是:

tf.keras.callbacks.EarlyStopping(monitor=['val_loss', 'average_precision_at_k10'], patience=10)

显然这不起作用。有什么想法可以做到这一点吗？

- Marcus

简单的解决方案/问题是：你是否尝试创建第三个函数，例如“avg_prc_at_10k_and_val_loss”，并在该方法内进行早期停止？例如，“如果val_loss()正在减少且avg_precision_at_k10正在减少-> early_stop = true”... - Andrey Bulezyuk

1

我考虑过，但没有找到足够的文档。根据这里的描述，我了解到可以创建一个自定义的EarlyStopping函数：https://datascience.stackexchange.com/questions/26833/is-there-away-to-change-the-metric-used-by-the-early-stopping-callback-in-keras 。它扩展了模型类，因此可以设置self.model.stop_training，但我不知道如何以类似的方式访问当前指标的值，例如val_loss。你有什么想法吗？ - Marcus

1

我的答案中的自定义回调框架展示了如何访问这些指标。有了这个框架，你应该能够开发出所需的代码。 - Gerry P

4个回答

2

您可以通过创建自定义回调函数来实现此目标。如何创建自定义回调函数的信息位于此处。以下是一些代码示例，说明您在自定义回调函数中可以做什么。我参考的文档显示了许多其他选项。

class LRA(keras.callbacks.Callback): # subclass the callback class
# create class variables as below. These can be accessed in your code outside the class definition as LRA.my_class_variable, LRA.best_weights
    my_class_variable=something  # a class variable
    best_weights=model.get_weights() # another  class variable
# define an initialization function with parameters you want to feed to the callback
    def __init__(self, param1, param2, etc):
        super(LRA, self).__init__()
        self.param1=param1
        self.param2=param2
        etc for all parameters
        # write any initialization code you need here

    def on_epoch_end(self, epoch, logs=None):  # method runs on the end of each epoch
        v_loss=logs.get('val_loss')  # example of getting log data at end of epoch the validation loss for this epoch
        acc=logs.get('accuracy') # another example of getting log data 
        LRA.best_weights=model.get_weights() # example of setting class variable value
        print(f'Hello epoch {epoch} has just ended') # print a message at the end of every epoch
        lr=float(tf.keras.backend.get_value(self.model.optimizer.lr)) # get the current learning rate
        if v_loss > self.param1:
           new_lr=lr * self.param2
           tf.keras.backend.set_value(model.optimizer.lr, new_lr) # set the learning rate in the optimizer
        # write whatever code you need

- Gerry P

1

我建议您创建自己的回调函数。以下是一种同时监控准确度和损失的解决方案。您可以将acc替换为自己的指标：

class CustomCallback(keras.callbacks.Callback):
    acc = {}
    loss = {}
    best_weights = None
    
    def __init__(self, patience=None):
        super(CustomCallback, self).__init__()
        self.patience = patience
    
    def on_epoch_end(self, epoch, logs=None):
        epoch += 1
        self.loss[epoch] = logs['loss']
        self.acc[epoch] = logs['accuracy']
    
        if self.patience and epoch > self.patience:
            # best weight if the current loss is less than epoch-patience loss. Simiarly for acc but when larger
            if self.loss[epoch] < self.loss[epoch-self.patience] and self.acc[epoch] > self.acc[epoch-self.patience]:
                self.best_weights = self.model.get_weights()
            else:
                # to stop training
                self.model.stop_training = True
                # Load the best weights
                self.model.set_weights(self.best_weights)
        else:
            # best weight are the current weights
            self.best_weights = self.model.get_weights()

请记住，如果您想控制监测数量的最小变化（也称为min_delta），则必须将其集成到代码中。

以下是构建自定义回调的文档：custom_callback

- Minions

请查看此文档，了解如何编写高质量的答案以及在回答中包含外部链接的最佳实践：如何回答问题。这是 Stack Overflow 社区的规定，确保所有回答都是有用的和准确的，同时帮助提问者更好地理解他们的问题。 - Nicolas Gervais

那并没有真正回答问题。这只是来自文档的样板代码。它并不能做到 OP 想要的... - Nicolas Gervais

我相信在我所参考的文档中有一个使用自定义回调函数进行早停的示例。 - Gerry P

@NicolasGervais，解决方案已添加。 - Minions

不仅仅是进行早期停止，我建议您修改自定义回调函数，根据验证损失和前10个平均精度来调整学习率。这可能为获取更高的模型性能提供机会。应该很容易在您的回调函数中实现。 - Gerry P

谢谢@GerryP，你是对的。这是基本代码，上面可以添加很多东西;) - Minions

0

在这个点上，制作一个自定义循环并只使用if语句会更简单。例如：

def main(epochs=50):
    for epoch in range(epochs):
        fit(epoch)

        if test_acc.result() > .8 and topk_acc.result() > .9:
            print(f'\nEarly stopping. Test acc is above 80% and TopK acc is above 90%.')
            break

if __name__ == '__main__':
    main(epochs=100)

这是一个使用该方法的简单自定义训练循环：

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow_datasets as tfds
import tensorflow as tf

data, info = tfds.load('iris', split='train',
                       as_supervised=True,
                       shuffle_files=True,
                       with_info=True)

def preprocessing(inputs, targets):
    scaled = tf.divide(inputs, tf.reduce_max(inputs, axis=0))
    return scaled, targets

dataset = data.filter(lambda x, y: tf.less_equal(y, 2)).\
    map(preprocessing).\
    shuffle(info.splits['train'].num_examples)

train_dataset = dataset.take(120).batch(4)
test_dataset = dataset.skip(120).take(30).batch(4)


model = tf.keras.Sequential([
    tf.keras.layers.Dense(8, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(info.features['label'].num_classes, activation='softmax')
    ])


loss_object = tf.losses.SparseCategoricalCrossentropy(from_logits=True)

train_loss = tf.metrics.Mean()
test_loss = tf.metrics.Mean()

train_acc = tf.metrics.SparseCategoricalAccuracy()
test_acc = tf.metrics.SparseCategoricalAccuracy()

topk_acc = tf.metrics.SparseTopKCategoricalAccuracy(k=2)

opt = tf.keras.optimizers.Adam(learning_rate=1e-3)


@tf.function
def train_step(inputs, labels):
    with tf.GradientTape() as tape:
        logits = model(inputs)
        loss = loss_object(labels, logits)

    gradients = tape.gradient(loss, model.trainable_variables)
    opt.apply_gradients(zip(gradients, model.trainable_variables))
    train_loss(loss)
    train_acc(labels, logits)


@tf.function
def test_step(inputs, labels):
    logits = model(inputs)
    loss = loss_object(labels, logits)

    test_loss.update_state(loss)
    test_acc.update_state(labels, logits)

    topk_acc.update_state(labels, logits)

def fit(epoch):
    template = 'Epoch {:>2} Train Loss {:.3f} Test Loss {:.3f} ' \
               'Train Acc {:.2f} Test Acc {:.2f} Test TopK Acc {:.2f} '

    train_loss.reset_states()
    test_loss.reset_states()
    train_acc.reset_states()
    test_acc.reset_states()

    topk_acc.reset_states()

    for X_train, y_train in train_dataset:
        train_step(X_train, y_train)

    for X_test, y_test in test_dataset:
        test_step(X_test, y_test)

    print(template.format(
        epoch + 1,
        train_loss.result(),
        test_loss.result(),
        train_acc.result(),
        test_acc.result(),
        topk_acc.result()
    ))


def main(epochs=50):
    for epoch in range(epochs):
        fit(epoch)

        if test_acc.result() > .8 and topk_acc.result() > .9:
            print(f'\nEarly stopping. Test acc is above 80% and TopK acc is above 90%.')
            break

if __name__ == '__main__':
    main(epochs=100)

- Nicolas Gervais

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Marcus · Accepted Answer

在Gerry P的指导下，我成功创建了自己的自定义EarlyStopping回调函数，并想在这里发布，以防其他人想要实现类似的功能。

如果patience个时期内同时验证损失和前10个平均精度均未改善，则执行提前停止。

class CustomEarlyStopping(keras.callbacks.Callback):
    def __init__(self, patience=0):
        super(CustomEarlyStopping, self).__init__()
        self.patience = patience
        self.best_weights = None
        
    def on_train_begin(self, logs=None):
        # The number of epoch it has waited when loss is no longer minimum.
        self.wait = 0
        # The epoch the training stops at.
        self.stopped_epoch = 0
        # Initialize the best as infinity.
        self.best_v_loss = np.Inf
        self.best_map10 = 0

    def on_epoch_end(self, epoch, logs=None): 
        v_loss=logs.get('val_loss')
        map10=logs.get('val_average_precision_at_k10')

        # If BOTH the validation loss AND map10 does not improve for 'patience' epochs, stop training early.
        if np.less(v_loss, self.best_v_loss) and np.greater(map10, self.best_map10):
            self.best_v_loss = v_loss
            self.best_map10 = map10
            self.wait = 0
            # Record the best weights if current results is better (less).
            self.best_weights = self.model.get_weights()
        else:
            self.wait += 1
            if self.wait >= self.patience:
                self.stopped_epoch = epoch
                self.model.stop_training = True
                print("Restoring model weights from the end of the best epoch.")
                self.model.set_weights(self.best_weights)
                
    def on_train_end(self, logs=None):
        if self.stopped_epoch > 0:
            print("Epoch %05d: early stopping" % (self.stopped_epoch + 1))

它然后被用作：

model.fit(
    x_train,
    y_train,
    batch_size=64,
    steps_per_epoch=5,
    epochs=30,
    verbose=0,
    callbacks=[CustomEarlyStopping(patience=10)],
)