Keras模型中的平均权重

14

如何在Keras模型中对权重进行平均,当我使用不同的初始化训练相同结构的几个模型时?

现在我的代码大致如下:

datagen = ImageDataGenerator(rotation_range=15,
                             width_shift_range=2.0/28,
                             height_shift_range=2.0/28
                            )

epochs = 40 
lr = (1.234e-3)
optimizer = Adam(lr=lr)

main_input = Input(shape= (28,28,1), name='main_input')

sub_models = []

for i in range(5):

    x = Conv2D(32, kernel_size=(3,3), strides=1)(main_input)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPool2D(pool_size=2)(x)

    x = Conv2D(64, kernel_size=(3,3), strides=1)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPool2D(pool_size=2)(x)

    x = Conv2D(64, kernel_size=(3,3), strides=1)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    x = Flatten()(x)

    x = Dense(1024)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.1)(x)

    x = Dense(256)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.4)(x)

    x = Dense(10, activation='softmax')(x)

    sub_models.append(x)

x = keras.layers.average(sub_models)

main_output = keras.layers.average(sub_models)

model = Model(inputs=[main_input], outputs=[main_output])

model.compile(loss='categorical_crossentropy', metrics=['accuracy'],
              optimizer=optimizer)

print(model.summary())

plot_model(model, to_file='model.png')

filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
tensorboard = TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=True)
callbacks = [checkpoint, tensorboard]

model.fit_generator(datagen.flow(X_train, y_train, batch_size=128),
                    steps_per_epoch=len(X_train) / 128,
                    epochs=epochs,
                    callbacks=callbacks,
                    verbose=1,
                    validation_data=(X_test, y_test))

所以现在我只平均最后一层,但是我想在训练每一层之后平均所有层的权重。

谢谢!


你不能简单地对神经网络的权重求平均。 - Dr. Snoopy
你目前尝试了什么?如果在每个层之间调用keras.layers.average()会怎样? - DarkCygnus
不想在每个层之间取平均值,因为我想单独训练每个模型。如果在每个层之后进行平均处理,则会得到不同的结果。同样,如果在训练之前在最后一层对模型进行平均处理,也是不同的。 - Miłosz Bednarzak
@MatiasValdenegro 是的,你可以:https://arxiv.org/abs/1803.05407 - Scratch
1
@Scratch 这篇论文并不支持这个问题所问的想法,它是关于在SGD轨迹上取平均值的,并且是在这个问题被提出之后才出现的。 - Dr. Snoopy
True。对于使用不同初始化训练的模型平均权重几乎没有意义,我只是想指出在某些特定情况下平均权重可能是有用的。 - Scratch
3个回答

18

假设models是一个包含你的模型的集合。首先,收集所有权重:

weights = [model.get_weights() for model in models]

现在 - 创建新的平均权重:

new_weights = list()

for weights_list_tuple in zip(*weights):
    new_weights.append(
        [numpy.array(weights_).mean(axis=0)\
            for weights_ in zip(*weights_list_tuple)])

现在剩下的就是在一个新模型中设置这些权重:

new_model.set_weights(new_weights)

当然,平均权重可能不是一个好主意,但如果你要尝试,你应该按照这种方法进行。


3
为什么这是个糟糕的想法?我受到了http://cs231n.github.io/neural-networks-3/#ensemble的启发,其中说这是个好主意;) - Miłosz Bednarzak
2
只是为了给你一个例子,说明为什么这可能会出错 - 拿一个模型并以一致的方式排列所有过滤器。网络在数学上是等效的 - 但平均值可能与原始函数相差很大。我不是说这是个坏主意 - 我认为这可能是个好主意 ;) - Marcin Możejko
https://github.com/miloszbednarzak/mnist/blob/master/mnist_averaged.ipynb - Miłosz Bednarzak
我做了那篇论文的一份实现:https://github.com/simon-larsson/keras-swa - Simon Larsson
我在这里遇到的错误是:TypeError: 无法使用灵活类型执行缩减操作 - Koti
显示剩余5条评论

10

我无法评论已接受的回答,但为了在tensorflow 2.0tf.keras上使其工作,我不得不将循环中的列表转换为numpy数组:

new_weights = list()
for weights_list_tuple in zip(*weights): 
    new_weights.append(
        np.array([np.array(w).mean(axis=0) for w in zip(*weights_list_tuple)])
    )
如果需要给不同的输入模型赋予不同的权重,那么np.array(w).mean(axis=0)需要替换为np.average(np.array(w),axis=0, weights=relative_weights),其中relative_weights是一个数组,其中每个模型都有一个权重因子。

我遇到了“TypeError: zip argument #5 must support iteration”错误。为什么会出现这个错误? - Koti

0
我在TensorFlow/Keras中有一个函数,用于计算多个客户端模型的可训练参数的平均值。平均值是逐层计算的。以下是我正在使用的函数:
def average_client_weights(client_models):
    """
    Compute the average of the trainable parameters across multiple client models.

    This function takes a list of client models and calculates the average of their 
    trainable parameters. The averaging is done layer-wise, meaning that the average 
    for each layer is computed separately and then returned as a list of average weights 
    for each layer.

    Args:
    - client_models (list of objects): A list of objects representing the client models. 
      Each object is expected to have an attribute `trainable_variables` that returns a 
      list of `tf.Variable` objects representing the trainable parameters of the model.

    Returns:
    - avg_weights (list of tf.Tensor): A list of tensors representing the average weights 
      of the trainable parameters of the client models. Each tensor in the list corresponds 
      to the average weight for a specific layer.

    Example:
    If client_models[0].trainable_variables = [W1, b1, W2, b2], where W1, b1, W2, b2 are 
    tensors, then avg_weights = [avg_W1, avg_b1, avg_W2, avg_b2], where avg_W1, avg_b1, 
    avg_W2, avg_b2 are the average weights for each corresponding layer.
    """
    # Retrieve the trainable variables from each client model
    client_weights = [model.trainable_variables for model in client_models]

    # Compute the average weights for each layer
    avg_weights = [
        tf.reduce_mean(layer_weight_tensors, axis=0)
        for layer_weight_tensors in zip(*client_weights)
    ]

    return avg_weights

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接