在
这篇帖子中,我建议使用
split-folders包将主数据目录随机分成训练和验证目录,同时保留类别子文件夹。然后,您可以使用keras的
.flow_from_directory
方法指定您的训练和验证路径。
从文档中分离您的文件夹:
import split_folders
split_folders.ratio('input_folder', output="output", seed=1337, ratio=(.8, .1, .1))
split_folders.fixed('input_folder', output="output", seed=1337, fixed=(100, 100), oversample=False)
输入文件夹应具有以下格式:
input/
class1/
img1.jpg
img2.jpg
...
class2/
imgWhatever.jpg
...
...
为了给您提供这个:
output/
train/
class1/
img1.jpg
...
class2/
imga.jpg
...
val/
class1/
img2.jpg
...
class2/
imgb.jpg
...
test/ # optional
class1/
img3.jpg
...
class2/
imgc.jpg
...
使用Keras的ImageDataGenerator构建训练和验证数据集:
import tensorflow as tf
import split_folders
import os
main_dir = '/Volumes/WMEL/Independent Research Project/Data/test_train/Data'
output_dir = '/Volumes/WMEL/Independent Research Project/Data/test_train/output'
split_folders.ratio(main_dir, output=output_dir, seed=1337, ratio=(.7, .3))
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./224)
train_generator = train_datagen.flow_from_directory(os.path.join(output_dir,'train'),
class_mode='categorical',
batch_size=32,
target_size=(224,224),
shuffle=True)
validation_generator = train_datagen.flow_from_directory(os.path.join(output_dir,'val'),
target_size=(224, 224),
batch_size=32,
class_mode='categorical',
shuffle=True)
base_model = tf.keras.applications.ResNet50V2(
input_shape=IMG_SHAPE,
include_top=False,
weights=None)
maxpool_layer = tf.keras.layers.GlobalMaxPooling2D()
prediction_layer = tf.keras.layers.Dense(4, activation='softmax')
model = tf.keras.Sequential([
base_model,
maxpool_layer,
prediction_layer
])
opt = tf.keras.optimizers.Adam(lr=0.004)
model.compile(optimizer=opt,
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
model.fit(
train_generator,
steps_per_epoch = train_generator.samples // 32,
validation_data = validation_generator,
validation_steps = validation_generator.samples // 32,
epochs = 20)