Resnet网络不符合预期

13

你好,我试图使用Resnet神经网络对癌症数据集进行训练,采用微调方法。

以下是我的微调方法:

image_input = Input(shape=(224, 224, 3))

model = ResNet50(input_tensor=image_input, include_top=True,weights='imagenet')
model.summary()
last_layer = model.get_layer('avg_pool').output
x= Flatten(name='flatten')(last_layer)
out = Dense(num_classes, activation='softmax', name='output_layer')(x)
custom_resnet_model = Model(inputs=image_input,outputs= out)
custom_resnet_model.summary()

for layer in custom_resnet_model.layers[:-1]:
    layer.trainable = False

custom_resnet_model.layers[-1].trainable

custom_resnet_model.compile(Adam(lr=0.001),loss='categorical_crossentropy',metrics=['accuracy'])

custom_resnet_model.summary()

tensorboard = TensorBoard(log_dir='./logs', histogram_freq=0,
                      write_graph=True, write_images=False)

hist = custom_resnet_model.fit(X_train, X_valid, batch_size=32, epochs=nb_epoch, verbose=1, validation_data=(Y_train, Y_valid),callbacks=[tensorboard])

(loss, accuracy) = custom_resnet_model.evaluate(Y_train,Y_valid,batch_size=batch_size,verbose=1)

print("[INFO] loss={:.4f}, accuracy: {:.4f}%".format(loss,accuracy * 100))

df = pd.read_csv('C:/CT_SCAN_IMAGE_SET/resnet_50/dbs2017/data/stage1_sample_submission.csv')
df2 = pd.read_csv('C:/CT_SCAN_IMAGE_SET/resnet_50/dbs2017/data/stage1_solution.csv')
x = np.array([np.mean(np.load('E:/224x224/%s.npy' % str(id)), axis=0) for id in df['id'].tolist()])

x = x.transpose(0,2,3,1)
# Make predictions 
pred = model.predict(x, batch_size=batch_size, verbose=1) #predict(self, x, batch_size=None, verbose=0, steps=None)

print (pred)

我使用验证集来预测最终结果。 但即使进行了100个epoch,我的预测类别的概率仍然非常低。

 5.41865666e-05 2.16298591e-04 2.77880055e-04 7.53038039e-05
 6.03657216e-04 1.30494649e-04 4.92068466e-05 5.37877844e-04
 1.61486780e-04 6.16881996e-04 9.92802554e-04 5.50923753e-04
 3.62671199e-05 3.44127137e-03 7.17231014e-05 2.79643398e-04
 2.86785862e-03 1.70384112e-04 6.59705256e-05 7.11611006e-04
 2.09898906e-04 1.82953620e-04 8.88684444e-05 1.87824480e-04
 1.32007655e-04 2.11239138e-04 7.63713342e-06 1.29785520e-04
 1.09007429e-04 3.14327976e-04 4.73849563e-04 4.22359008e-04
 6.27386966e-04 2.03593503e-04 1.72056989e-05 8.38911365e-05
 1.91937244e-04 1.59160278e-04 5.24159847e-03 1.45429352e-04
 4.30631888e-04 6.92744215e-04 1.00537611e-04 6.27409827e-05
 3.87431937e-04 1.37840703e-04 1.04467930e-04 1.74013167e-05
 1.18957250e-04 2.77637475e-04 2.25973461e-04 1.21678226e-04
 2.42197304e-04 2.99750012e-04 1.16530759e-03 1.29382452e-03
 7.35349662e-04 5.71311277e-04 1.26631945e-04 4.74024746e-05
 3.71460657e-04 1.23646241e-04]
这是如何显示TensorBoard结果的。 enter image description here 请问为什么概率没有得到改进?或者有什么建议可以用来改善这个问题吗?
附:根据一个答案所建议的ResNet网络部分的摘要。
add_15 (Add)                    (None, 7, 7, 2048)   0           bn5b_branch2c[0][0]              
                                                                 activation_43[0][0]              
__________________________________________________________________________________________________
activation_46 (Activation)      (None, 7, 7, 2048)   0           add_15[0][0]                     
__________________________________________________________________________________________________
res5c_branch2a (Conv2D)         (None, 7, 7, 512)    1049088     activation_46[0][0]              
__________________________________________________________________________________________________
bn5c_branch2a (BatchNormalizati (None, 7, 7, 512)    2048        res5c_branch2a[0][0]             
__________________________________________________________________________________________________
activation_47 (Activation)      (None, 7, 7, 512)    0           bn5c_branch2a[0][0]              
__________________________________________________________________________________________________
res5c_branch2b (Conv2D)         (None, 7, 7, 512)    2359808     activation_47[0][0]              
__________________________________________________________________________________________________
bn5c_branch2b (BatchNormalizati (None, 7, 7, 512)    2048        res5c_branch2b[0][0]             
__________________________________________________________________________________________________
activation_48 (Activation)      (None, 7, 7, 512)    0           bn5c_branch2b[0][0]              
__________________________________________________________________________________________________
res5c_branch2c (Conv2D)         (None, 7, 7, 2048)   1050624     activation_48[0][0]              
__________________________________________________________________________________________________
bn5c_branch2c (BatchNormalizati (None, 7, 7, 2048)   8192        res5c_branch2c[0][0]             
__________________________________________________________________________________________________
add_16 (Add)                    (None, 7, 7, 2048)   0           bn5c_branch2c[0][0]              
                                                                 activation_46[0][0]              
__________________________________________________________________________________________________
activation_49 (Activation)      (None, 7, 7, 2048)   0           add_16[0][0]                     
__________________________________________________________________________________________________
avg_pool (AveragePooling2D)     (None, 1, 1, 2048)   0           activation_49[0][0]              
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 2048)         0           avg_pool[0][0]                   
__________________________________________________________________________________________________
fc1000 (Dense)                  (None, 1000)         2049000     flatten_1[0][0]                  
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 512)          512512      fc1000[0][0]                     
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 512)          0           dense_1[0][0]                    
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 512)          262656      dropout_1[0][0]                  
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, 512)          0           dense_2[0][0]                    
__________________________________________________________________________________________________
output_layer (Dense)            (None, 2)            1026        dropout_2[0][0]                  
==================================================================================================

只是一个小提示:你确定需要这么庞大的网络吗?你的数据是什么?你正在研究细胞的微观图像吗?还是一组细胞?或者是乳腺X线摄影?或PET/CT扫描?你的数据集有多大?选择正确的架构也非常重要——太小了它就没有足够的能力,太大了它就会过拟合。 - Chan Kha Vu
嗨,我正在使用大小为512 * 512的CT扫描图像。它总共有1582张图片,从头开始运行相当困难,因此我选择了微调方法来训练网络。 - user3789200
2个回答

11

就我从您的图表中所能发现的,您存在过拟合的问题。为了避免过拟合,您应该使用 Dropout 技术。在下面,我添加了两个新的密集层,分别跟随着 2 个 Dropout 层(您可以减少至 1 对或增加至更多)。您可以微调它们的参数。

我也想提出一种更简单的网络表示方式,但是您的方式也很好。

base_model = ResNet50(input_shape=(224, 224, 3), include_top=False,weights='imagenet',pooling='avg')
x=base_model.output

x = Dense(512, activation='relu')(x) #add new layer
x = Dropout(0.5)(x) #add new layer
x = Dense(512, activation='relu')(x) #add new layer
x = Dropout(0.5)(x) #add new layer

out = Dense(62, activation='softmax', name='output_layer')(x)
custom_resnet_model = Model(inputs=base_model.input,outputs= out)

for layer in base_model.layers:
    layer.trainable = False


custom_resnet_model.compile(Adam(lr=0.001),loss='categorical_crossentropy',metrics=['accuracy'])

custom_resnet_model.summary()
...

最后,您可以尝试不同的学习率参数和不同的预训练模型。


非常感谢您的建议。我会检查并尽快回复结果。 - user3789200
嗨Ioannis。实际上,我明白了为什么预测的概率结果非常低。这是我的愚蠢错误。我使用了pred = model.predict而不是custom_resnet_model.predict。但我发现你的答案非常有用,可以理解我的数据集正在过度拟合。添加dropout层可以获得更准确的结果,精度为Accuracy: 0.6161616161616161,敏感性为Sensitivity: 0.6595744680851063,特异性为Specificity: 0.5087719298245614。我觉得我可以进一步改善结果。您有什么建议吗?我能比这更微调模型吗? - user3789200
微调需要许多次尝试。你尝试的次数越多,就越确定没有进一步的改进。我不知道你的数据,无法告诉你是否可以进一步改进。我认为值得尝试不同的预训练模型。此外,模型集成的输出也可能会提高评分。 - Ioannis Nasios
嗨Ioannis,感谢你的建议。如果我需要减少网络结构中被冻结的层并训练更多的卷积层,应该如何在代码中实现呢?比如说,如果我想要从“res5c_branch2a(Conv2D)”这一层开始训练网络,应该怎么做呢?能否请你举个小例子呢? - user3789200

4
不,我认为这不是过度拟合。为什么您要设置include_top=True?finetuning应该将其设置为False。
这段内容来自Keras Applications(文档)。(https://keras.io/applications/)
base_model = InceptionV3(weights='imagenet', include_top=False)

-这是错误的,因为在卷积之后不需要Inception的密集层,因为它是针对Imagenet而设计的,其具有1000个类(Dense(1000,"softmax")),而这并不适用于您的数据集。

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(num_classes, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=predictions)

据我的经验,避免使用以下4行代码,我认为这是有问题的。但可以尝试使用,不过最好先不要尝试,并将学习速率保持低水平。然后你应该在另一个.py文件中尝试使用。

for layer in model.layers[:249]:
   layer.trainable = False
for layer in model.layers[249:]:
   layer.trainable = True

# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

# train the model on the new data for a few epochs
model.fit_generator(...)

谢谢您的建议。 :) 我会尝试这个。 - user3789200

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接