使用增强图像和其他功能的Keras迭代器

Question

使用增强图像和其他功能的Keras迭代器

pythonkerasconv-neural-networkdata-augmentation

10

假设你有一个数据集，其中包含每个图像的图像和一些数据在.csv中。您的目标是创建一个卷积分支和另一个（在我的情况下是MLP）的NN。

现在，有很多指南（例如这里和这里），介绍如何创建网络，这不是问题。

问题是，当convolution_input来自Keras ImageDataGenerator 流添加了增强图像时，如何创建形式为[[convolution_input, other_features], target]的迭代器。

更具体地说，当第n个图像（可能是增强图像或非增强图像）被输入NN时，我想要其原始特征在other_features中。我发现有一些尝试（在这里和在这里），第二个看起来很有前途但我无法弄清如何处理增强图像，它们似乎没有考虑到 Keras 生成器可能进行的数据集操作。

- Lamberto Basti

1

问题：您是否可以使用flow，或者需要使用flow_from_directory？(flow表示您可以将所有图像加载到内存中) - Daniel Möller

我只想要一个自动处理图像转换的流程。在我的情况下，我使用了 flow_from_dataframe，因为我有文件名、特征和类别。 - Lamberto Basti

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- venkata krishnan · Accepted Answer

假设你有一个CSV文件，其中包含图像和其他特征。其中，“id”表示图像名称，后跟特征，再后跟目标值（分类为类别，回归为数字）。

|         id          | feat1 | feat2 | feat3 | class |
|---------------------|-------|-------|-------|-------|
| 1_face_IMG_NAME.jpg |   1   |   0   |   1   |   A   |
| 3_face_IMG_NAME.jpg |   1   |   0   |   1   |   B   |
| 2_face_IMG_NAME.jpg |   1   |   0   |   1   |   A   |
|         ...         |  ...  |  ...  |  ...  |  ...  |

首先，让我们定义一个数据生成器，稍后我们可以对其进行覆盖。

我们可以从CSV中的pandas数据帧中读取数据，并使用keras的flow_from_dataframe从数据帧中读取。

df = pandas.read_csv("dummycsv.csv")
datagen = ImageDataGenerator(rescale=1/255.)
generator = datagen.flow_from_dataframe(
                df,
                directory="out/",
                x_col="id",
                y_col=df.columns[1:],
                class_mode="raw",
                batch_size=1)

您可以随时在 ImageDataGenerator 中添加增强。

需要注意的是，在上述代码中的 flow_from_dataframe 中：

x_col = 图像名称

y_col = 通常是包含类名的列，但稍后让我们通过首先提供 CSV 中的所有其他列来覆盖它。即 feat_1、feat_2……直到 class_label

class_mode = raw，表示生成器将按原样返回 y 中的所有值。

现在让我们覆盖/继承上述生成器并创建一个新的生成器，使其返回 [img, otherfeatures]，[target]

以下是带有注释的代码：

def my_custom_generator():
    # to keep track of complete epoch
    count = 0 
    while True:
        if count == len(df.index):
            # if the count is matching with the length of df, 
            # the one pass is completed, so reset the generator
            generator.reset()
            break
        count += 1
        # get the data from the generator
        data = generator.next()

        # the data looks like this [[img,img] , [other_cols,other_cols]]  based on the batch size        
        imgs = []
        cols = []
        targets = []

        # iterate the data and append the necessary columns in the corresponding arrays 
        for k in range(batch_size):
            # the first array contains all images
            imgs.append(data[0][k])
      
            # the second array contains all features with last column as class, so [:-1]
            cols.append(data[1][k][:-1])

            # the last column in the second array from data is the class
            targets.append(data[1][k][-1])

        # this will yield the result as you expect.
        yield [imgs,cols], targets

为您的验证生成器创建类似的功能。如果需要，使用train_test_split将数据框拆分，并创建2个生成器并覆盖它们。

像这样将函数传递给model.fit_generator

model.fit_generator(my_custom_generator(),.....other params)