这里有一个关于VGG16微调的例子,可以在keras博客中找到,但我无法复现。
更准确地说,以下是用于初始化没有顶层的VGG16并冻结除最顶部之外的所有块的代码:
WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5'
weights_path = get_file('vgg16_weights.h5', WEIGHTS_PATH_NO_TOP)
model = Sequential()
model.add(InputLayer(input_shape=(150, 150, 3)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1'))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2'))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3'))
model.add(MaxPooling2D((2, 2), strides=(2, 2), name='block5_maxpool'))
model.load_weights(weights_path)
for layer in model.layers:
layer.trainable = False
for layer in model.layers[-4:]:
layer.trainable = True
print("Layer '%s' is trainable" % layer.name)
下一步,创建一个只有单隐藏层的顶层模型:
top_model = Sequential()
top_model.add(Flatten(input_shape=model.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(1, activation='sigmoid'))
top_model.load_weights('top_model.h5')
请注意,它先前是在瓶颈特征上训练的,就像博客文章中所描述的那样。接下来,将此顶部模型添加到基本模型中并进行编译:
model.add(top_model)
model.compile(loss='binary_crossentropy',
optimizer=SGD(lr=1e-4, momentum=0.9),
metrics=['accuracy'])
最终,适合猫/狗数据:
batch_size = 16
train_datagen = ImageDataGenerator(rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_gen = train_datagen.flow_from_directory(
TRAIN_DIR,
target_size=(150, 150),
batch_size=batch_size,
class_mode='binary')
valid_gen = test_datagen.flow_from_directory(
VALID_DIR,
target_size=(150, 150),
batch_size=batch_size,
class_mode='binary')
model.fit_generator(
train_gen,
steps_per_epoch=nb_train_samples // batch_size,
epochs=nb_epoch,
validation_data=valid_gen,
validation_steps=nb_valid_samples // batch_size)
但是,当我试图拟合时,出现了以下错误:
ValueError:检查模型目标时出错:期望block5_maxpool有4个维度,但得到形状为(16,1)的数组
因此,似乎基本模型中的最后一个池化层出现了问题。或者可能是在尝试连接基本模型与顶部模型时做错了什么。
有人遇到过类似的问题吗?或者也许有更好的方法来构建这样的“连接”模型?我正在使用带有theano后端的keras==2.0.0。
注意:我使用了gist和applications.VGG16实用程序的示例,但在尝试连接模型时出现了问题,我对keras函数API不太熟悉。因此,我提供的解决方案是最“成功”的解决方案,即仅在拟合阶段失败。
更新 #1
好的,这里是我尝试做的事情的简要解释。首先,我正在生成来自VGG16的瓶颈特征,步骤如下:
def save_bottleneck_features():
datagen = ImageDataGenerator(rescale=1./255)
model = applications.VGG16(include_top=False, weights='imagenet')
generator = datagen.flow_from_directory(
TRAIN_DIR,
target_size=(150, 150),
batch_size=batch_size,
class_mode=None,
shuffle=False)
print("Predicting train samples..")
bottleneck_features_train = model.predict_generator(generator, nb_train_samples)
np.save(open('bottleneck_features_train.npy', 'w'), bottleneck_features_train)
generator = datagen.flow_from_directory(
VALID_DIR,
target_size=(150, 150),
batch_size=batch_size,
class_mode=None,
shuffle=False)
print("Predicting valid samples..")
bottleneck_features_valid = model.predict_generator(generator, nb_valid_samples)
np.save(open('bottleneck_features_valid.npy', 'w'), bottleneck_features_valid)
然后,我创建一个顶层模型,并按以下方式对其进行训练:
def train_top_model():
train_data = np.load(open('bottleneck_features_train.npy'))
train_labels = np.array([0]*(nb_train_samples / 2) +
[1]*(nb_train_samples / 2))
valid_data = np.load(open('bottleneck_features_valid.npy'))
valid_labels = np.array([0]*(nb_valid_samples / 2) +
[1]*(nb_valid_samples / 2))
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels,
nb_epoch=nb_epoch,
batch_size=batch_size,
validation_data=(valid_data, valid_labels),
verbose=1)
model.save_weights('top_model.h5')
基本上,有两个经过训练的模型,具有ImageNet权重的
base_model
和由瓶颈特征生成的权重的top_model
。我想知道如何将它们连接起来?这可能吗,还是我做错了什么?因为我可以看到,@thomas-pinetz的回复认为顶部模型没有单独训练,而是直接附加到模型中。不确定我是否清楚,以下是博客中的引用:为了进行微调,所有层都应该从合适的训练权重开始:例如,你不应该在预先训练的卷积基础上随意添加随机初始化的全连接网络。这是因为由随机初始化的权重引发的大梯度更新会破坏卷积基础中学习到的权重。在我们的情况下,这就是为什么我们首先训练顶层分类器,然后才开始微调卷积权重。