如何将KFold交叉验证的输出作为卷积神经网络图像处理的输入?

6

我想使用卷积神经网络(CNN)进行图像分类,并希望使用KFold交叉验证进行数据训练和测试。但是我是新手,不太明白如何做。

我已经在分开的代码中尝试了KFold交叉验证和CNN,但我不知道如何将它们结合起来。

我正在使用具有3个类别的iris_data.csv作为输入示例。

import pandas as pd
import numpy as np
from sklearn.model_selection import KFold
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVR

dataset = pd.read_csv('iris_data.csv')

x = dataset.iloc[:,0:3]
y = dataset.iloc[:, 4]

scaler = MinMaxScaler(feature_range=(0, 1))
x = scaler.fit_transform(x)

cv = KFold(n_splits=10, shuffle=False)
for train_index, test_index in cv.split(x):
    print("Train Index: ", train_index, "\n")
    print("Test Index: ", test_index)

    x_train, x_test, y_train, y_test = x[train_index], x[test_index], y[train_index], y[test_index]

以下是CNN代码示例。

import numpy as np
import tensorflow as tf
from keras.models import Model
from keras.layers import Input, Activation, Dense, Conv2D, MaxPooling2D, Flatten
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras.callbacks import TensorBoard

# Images Dimensions
img_width, img_height = 200, 200

# Data Path
train_data_dir = 'data/train'
validation_data_dir = 'data/validation'

# Parameters
nb_train_samples = 100
nb_validation_samples = 50
epochs = 50
batch_size = 10

# TensorBoard Callbacks
callbacks = TensorBoard(log_dir='./Graph')

# Training Data Augmentation
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

# Rescale Testing Data
test_datagen = ImageDataGenerator(rescale=1. / 255)

# Train Data Generator
train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

# Testing Data Generator
validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

# Feature Extraction Layer KorNet
inputs = Input(shape=(img_width, img_height, 3))
conv_layer = Conv2D(16, (5, 5), strides=(3,3), activation='relu')(inputs) 
conv_layer = MaxPooling2D((2, 2))(conv_layer) 
conv_layer = Conv2D(32, (5, 5), strides=(3,3), activation='relu')(conv_layer) 
conv_layer = MaxPooling2D((2, 2))(conv_layer) 

# Flatten Layer
flatten = Flatten()(conv_layer) 

# Fully Connected Layer
fc_layer = Dense(32, activation='relu')(flatten)
outputs = Dense(3, activation='softmax')(fc_layer)

model = Model(inputs=inputs, outputs=outputs)

# Adam Optimizer and Cross Entropy Loss
adam = Adam(lr=0.0001)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])

# Print Model Summary
print(model.summary())

model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size, 
    callbacks=[callbacks])

model.save('./models/model.h5')
model.save_weights('./models/weights.h5')

我希望将KFold交叉验证的结果用作CNN的训练和测试数据。
2个回答

3

只需像这样操作

from keras.models import Sequential
from sklearn.model_selection import KFold
import numpy

dataset = numpy.loadtxt("iris_data.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:3]
Y = dataset[:,4]
# define 10-fold cross validation test harness
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
cvscores = []
for train, test in kfold.split(X, Y):
  # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    .
    .
    .
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    # Fit the model
    model.fit(X[train], Y[train], epochs=150, batch_size=10, verbose=0)
    # evaluate the model
    scores = model.evaluate(X[test], Y[test], verbose=0)
    print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
    cvscores.append(scores[1] * 100)
print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores)))

查看这个链接


CSV文件中有字符串,我收到了一个错误信息“ValueError: could not convert string to float”,该如何解决? - Mars
修改你的 CSV 文件,使其不包含字符串? - christk
如果我的回答对您有帮助,请投票支持我。 - christk

0

首先,你必须了解K折交叉验证的工作原理,我假设你已经了解了,如果你还不了解,可以查看这个资源以获得更深入的理解:

https://vitalflux.com/k-fold-cross-validation-python-example/

根据您的数据集,您需要将其分成k个折叠,我建议您查看以前的笔记本并查看它们将数据集分成了多少个折叠,或者如果数据集很小,则使用大量的折叠,反之亦然。
在应用K-fold之前的最后一步是将数据集拆分为X和Y,然后:
X = np_image_list
y = image_labels

在这部分,您可以初始化一个列表来计算每个参数的平均值,如准确率、召回率等。
Train_accuracy = []
Test_accuracy = []


from sklearn.model_selection import KFold
k =10
kf = KFold(n_splits=k, random_state=True, shuffle=True)

for train_index, test_index in kf.split(X) :
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    #Your Entire model
    #then your evaluation to the model "Accuracy, recall..etc"
    #append the calculated accuracy to your list
    t_accuracy = accuracy_score(y_train,Model_pred_train)*100
    Train_accuracy.append(train_accuacy)

最后取平均值: 如果折叠次数为k=10,则

print("Average Train Accuracy = "+ str(Train_accuracy/k))

你可以查看我的代码,我在这里创建了一个教程,介绍如何在图像分类模型中应用K折交叉验证方法。 https://github.com/ZienabEsam/Image-Classifcation-using-K-fold-method

使用的是卷积神经网络AlexNet。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接