分割图像上的重叠预测

4

症状的背景和示例

我正在使用神经网络进行超分辨率(增加图像的分辨率)。但是,由于一幅图像可能很大,所以我需要将其分成多个较小的图像,并在每个图像上单独进行预测,然后再将结果合并在一起。

以下是这个过程的示例:

example 1 example 2 example 3

示例1: 在输出图片中,您可以看到一个微妙的垂直线穿过滑雪者的肩膀。

示例2: 一旦您开始看到它们,就会注意到微妙的线条正在整个图像中形成正方形(我将图像分割为单独预测的残留物)。

示例3: 您可以清楚地看到穿过湖泊的垂直线。


问题来源

基本上,我的网络在边缘处做出了糟糕的预测,我认为这是正常的,因为周围的信息较少。


源代码

import numpy as np
import matplotlib.pyplot as plt
import skimage.io

from keras.models import load_model

from constants import verbosity, save_dir, overlap, \
    model_name, tests_path, input_width, input_height
from utils import float_im

def predict(args):
    model = load_model(save_dir + '/' + args.model)

    image = skimage.io.imread(tests_path + args.image)[:, :, :3]  # removing possible extra channels (Alpha)
    print("Image shape:", image.shape)

    predictions = []
    images = []

    crops = seq_crop(image)  # crops into multiple sub-parts the image based on 'input_' constants

    for i in range(len(crops)):  # amount of vertical crops
        for j in range(len(crops[0])):  # amount of horizontal crops
            current_image = crops[i][j]
            images.append(current_image)

    print("Moving on to predictions. Amount:", len(images))

    for p in range(len(images)):
        if p%3 == 0 and verbosity == 2:
            print("--prediction #", p)
        # Hack because GPU can only handle one image at a time
        input_img = (np.expand_dims(images[p], 0))       # Add the image to a batch where it's the only member
        predictions.append(model.predict(input_img)[0])  # returns a list of lists, one for each image in the batch

    return predictions, image, crops


def show_pred_output(input, pred):
    plt.figure(figsize=(20, 20))
    plt.suptitle("Results")

    plt.subplot(1, 2, 1)
    plt.title("Input : " + str(input.shape[1]) + "x" + str(input.shape[0]))
    plt.imshow(input, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)

    plt.subplot(1, 2, 2)
    plt.title("Output : " + str(pred.shape[1]) + "x" + str(pred.shape[0]))
    plt.imshow(pred, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)

    plt.show()


# adapted from  https://dev59.com/MK7la4cB1Zd3GeqPfpcA#52463034
def seq_crop(img):
    """
    To crop the whole image in a list of sub-images of the same size.
    Size comes from "input_" variables in the 'constants' (Evaluation).
    Padding with 0 the Bottom and Right image.
    :param img: input image
    :return: list of sub-images with defined size
    """
    width_shape = ceildiv(img.shape[1], input_width)
    height_shape = ceildiv(img.shape[0], input_height)
    sub_images = []  # will contain all the cropped sub-parts of the image

    for j in range(height_shape):
        horizontal = []
        for i in range(width_shape):
            horizontal.append(crop_precise(img, i*input_width, j*input_height, input_width, input_height))
        sub_images.append(horizontal)

    return sub_images


def crop_precise(img, coord_x, coord_y, width_length, height_length):
    """
    To crop a precise portion of an image.
    When trying to crop outside of the boundaries, the input to padded with zeros.
    :param img: image to crop
    :param coord_x: width coordinate (top left point)
    :param coord_y: height coordinate (top left point)
    :param width_length: width of the cropped portion starting from coord_x
    :param height_length: height of the cropped portion starting from coord_y
    :return: the cropped part of the image
    """

    tmp_img = img[coord_y:coord_y + height_length, coord_x:coord_x + width_length]

    return float_im(tmp_img)  # From [0,255] to [0.,1.]


# from  https://dev59.com/Y2Uq5IYBdhLWcg3wEcRZ#17511341
def ceildiv(a, b):
    return -(-a // b)


# adapted from  https://stackoverflow.com/a/52733370/9768291
def reconstruct(predictions, crops):

    # unflatten predictions
    def nest(data, template):
        data = iter(data)
        return [[next(data) for _ in row] for row in template]

    if len(crops) != 0:
        predictions = nest(predictions, crops)

    H = np.cumsum([x[0].shape[0] for x in predictions])
    W = np.cumsum([x.shape[1] for x in predictions[0]])
    D = predictions[0][0]
    recon = np.empty((H[-1], W[-1], D.shape[2]), D.dtype)
    for rd, rs in zip(np.split(recon, H[:-1], 0), predictions):
        for d, s in zip(np.split(rd, W[:-1], 1), rs):
            d[...] = s
    return recon


if __name__ == '__main__':
    print("   -  ", args)

    preds, original, crops = predict(args)  # returns the predictions along with the original
    enhanced = reconstruct(preds, crops)    # reconstructs the enhanced image from predictions

    plt.imsave('output/' + args.save, enhanced, cmap=plt.cm.gray)

    show_pred_output(original, enhanced)

问题(我想要的)

有很多显而易见的天真方法可以解决这个问题,但我相信必须有一种非常简洁的方法来实现它:如何添加一个overlap_amount变量,使我能够进行重叠预测,从而丢弃每个子图像(“片段”)的“边缘部分”,并用其周围段的预测结果替换它(因为它们不包含“边缘预测”)?

当然,我希望最小化“无用”的预测量(要丢弃的像素)。还值得注意的是,输入的段产生一个4倍大的输出段(即,如果它是一个20x20像素图像,则现在您将获得一个80x80像素的图像作为输出)。


为什么要将图像拆分成独立的部分?这样每个部分就可以在另一个线程/进程上处理?也许工作量应该在网络部分进行分割。 - Eran W
@EranW 尝试通过神经网络传递整个图像以在计算机的GPU上进行预测,结果出现了“OOM”(内存不足)错误,这就是为什么我需要将图像分成单独的部分并使用CPU将它们正确地合并在一起的原因。 - payne
我会从重叠的方法开始尝试(在行和列中都重叠),并尽可能找到一个较小的值来减少额外的推断。你仍然需要想办法如何混合重叠的预测结果(例如求平均值或求最大值)。 - m33n
2个回答

1

我通过将推断移到CPU上解决了一个类似的问题。虽然速度要慢得多,但在我的情况下,它比我测试过的基于ROI投票或丢弃的方法更好地解决了补丁边界问题。

假设您正在使用Tensorflow后端:

from tensorflow.python import device

with device('cpu:0')
    prediction = model.predict(...)

当然,前提是您有足够的RAM来适应您的模型。如果不是这种情况,请在下面发表评论,我会检查一下我的代码是否可以在这里使用。


有趣的是,我甚至从未考虑过这个作为一个解决方案,但它确实是合法的。然而,我仍然更喜欢得到一个面向GPU的解决方案。 - payne
你是否有之前进行不同测试时所使用的代码,这些测试结果最终让你选择了这个解决方案? - payne
跟你说个消息:试着使用CPU让电脑出了些问题,我得重启一下。 - payne
哎,抱歉,我完全忘记了这个。在崩溃之前,您是否检查过它是否填满了整个RAM? - Tapio
当我打开任务管理器时,一切都卡住了。我认为可以相当安全地假设发生了这种情况,而且我不想强制计算机进入另一种需要手动重新启动的情况。无论如何,我刚刚回到这个项目上,并想尝试这个懒惰的解决方案,但实际上我真的想要一个分段的解决方案(我刚刚开始处理天真的实现)。 - payne

1

我通过一种天真的方法解决了它。它可能可以更好,但至少这个方法是有效的。

过程

基本上,它需要初始图像,然后在其周围添加填充,然后将其裁剪成多个子图像,所有子图像都排成一个数组。裁剪是这样做的,以便所有图像也重叠其周围的邻居。

然后,每个图像都被输入到网络中,并收集预测结果(在这种情况下,图像的分辨率增加了4倍)。在重建图像时,每个预测都被单独取出,其边缘被裁剪掉(因为它包含错误)。裁剪是这样做的,以便所有预测的拼接结束时没有重叠,只有来自神经网络的预测的中间部分粘在一起。

最后,周围的填充被移除。

结果

没有线条了! :D

Proper prediction

代码

import numpy as np
import matplotlib.pyplot as plt
import skimage.io

from keras.models import load_model

from constants import verbosity, save_dir, overlap, \
    model_name, tests_path, input_width, input_height, scale_fact
from utils import float_im


def predict(args):
    """
    Super-resolution on the input image using the model.

    :param args:
    :return:
        'predictions' contains an array of every single cropped sub-image once enhanced (the outputs of the model).
        'image' is the original image, untouched.
        'crops' is the array of every single cropped sub-image that will be used as input to the model.
    """
    model = load_model(save_dir + '/' + args.model)

    image = skimage.io.imread(tests_path + args.image)[:, :, :3]  # removing possible extra channels (Alpha)
    print("Image shape:", image.shape)

    predictions = []
    images = []

    # Padding and cropping the image
    overlap_pad = (overlap, overlap)  # padding tuple
    pad_width = (overlap_pad, overlap_pad, (0, 0))  # assumes color channel as last
    padded_image = np.pad(image, pad_width, 'constant')  # padding the border
    crops = seq_crop(padded_image)  # crops into multiple sub-parts the image based on 'input_' constants

    # Arranging the divided image into a single-dimension array of sub-images
    for i in range(len(crops)):         # amount of vertical crops
        for j in range(len(crops[0])):  # amount of horizontal crops
            current_image = crops[i][j]
            images.append(current_image)

    print("Moving on to predictions. Amount:", len(images))
    upscaled_overlap = overlap * 2
    for p in range(len(images)):
        if p % 3 == 0 and verbosity == 2:
            print("--prediction #", p)

        # Hack due to some GPUs that can only handle one image at a time
        input_img = (np.expand_dims(images[p], 0))  # Add the image to a batch where it's the only member
        pred = model.predict(input_img)[0]          # returns a list of lists, one for each image in the batch

        # Cropping the useless parts of the overlapped predictions (to prevent the repeated erroneous edge-prediction)
        pred = pred[upscaled_overlap:pred.shape[0]-upscaled_overlap, upscaled_overlap:pred.shape[1]-upscaled_overlap]

        predictions.append(pred)
    return predictions, image, crops


def show_pred_output(input, pred):
    plt.figure(figsize=(20, 20))
    plt.suptitle("Results")

    plt.subplot(1, 2, 1)
    plt.title("Input : " + str(input.shape[1]) + "x" + str(input.shape[0]))
    plt.imshow(input, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)

    plt.subplot(1, 2, 2)
    plt.title("Output : " + str(pred.shape[1]) + "x" + str(pred.shape[0]))
    plt.imshow(pred, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)

    plt.show()


# adapted from  https://dev59.com/MK7la4cB1Zd3GeqPfpcA#52463034
def seq_crop(img):
    """
    To crop the whole image in a list of sub-images of the same size.
    Size comes from "input_" variables in the 'constants' (Evaluation).
    Padding with 0 the Bottom and Right image.

    :param img: input image
    :return: list of sub-images with defined size (as per 'constants')
    """
    sub_images = []  # will contain all the cropped sub-parts of the image
    j, shifted_height = 0, 0
    while shifted_height < (img.shape[0] - input_height):
        horizontal = []
        shifted_height = j * (input_height - overlap)
        i, shifted_width = 0, 0
        while shifted_width < (img.shape[1] - input_width):
            shifted_width = i * (input_width - overlap)
            horizontal.append(crop_precise(img,
                                           shifted_width,
                                           shifted_height,
                                           input_width,
                                           input_height))
            i += 1
        sub_images.append(horizontal)
        j += 1

    return sub_images


def crop_precise(img, coord_x, coord_y, width_length, height_length):
    """
    To crop a precise portion of an image.
    When trying to crop outside of the boundaries, the input to padded with zeros.

    :param img: image to crop
    :param coord_x: width coordinate (top left point)
    :param coord_y: height coordinate (top left point)
    :param width_length: width of the cropped portion starting from coord_x (toward right)
    :param height_length: height of the cropped portion starting from coord_y (toward bottom)
    :return: the cropped part of the image
    """
    tmp_img = img[coord_y:coord_y + height_length, coord_x:coord_x + width_length]
    return float_im(tmp_img)  # From [0,255] to [0.,1.]


# adapted from  https://stackoverflow.com/a/52733370/9768291
def reconstruct(predictions, crops):
    """
    Used to reconstruct a whole image from an array of mini-predictions.
    The image had to be split in sub-images because the GPU's memory
    couldn't handle the prediction on a whole image.

    :param predictions: an array of upsampled images, from left to right, top to bottom.
    :param crops: 2D array of the cropped images
    :return: the reconstructed image as a whole
    """

    # unflatten predictions
    def nest(data, template):
        data = iter(data)
        return [[next(data) for _ in row] for row in template]

    if len(crops) != 0:
        predictions = nest(predictions, crops)

    # At this point "predictions" is a 3D image of the individual outputs
    H = np.cumsum([x[0].shape[0] for x in predictions])
    W = np.cumsum([x.shape[1] for x in predictions[0]])
    D = predictions[0][0]
    recon = np.empty((H[-1], W[-1], D.shape[2]), D.dtype)
    for rd, rs in zip(np.split(recon, H[:-1], 0), predictions):
        for d, s in zip(np.split(rd, W[:-1], 1), rs):
            d[...] = s

    # Removing the pad from the reconstruction
    tmp_overlap = overlap * (scale_fact - 1)  # using "-2" leaves the outer edge-prediction error
    return recon[tmp_overlap:recon.shape[0]-tmp_overlap, tmp_overlap:recon.shape[1]-tmp_overlap]


if __name__ == '__main__':
    print("   -  ", args)

    preds, original, crops = predict(args)  # returns the predictions along with the original
    enhanced = reconstruct(preds, crops)    # reconstructs the enhanced image from predictions

    # Save and display the result
    plt.imsave('output/' + args.save, enhanced, cmap=plt.cm.gray)
    show_pred_output(original, enhanced)

常数和额外位

verbosity = 2

input_width = 64

input_height = 64

overlap = 16

scale_fact = 4

def float_im(img):
    return np.divide(img, 255.)

替代方案

如果你遇到和我一样的问题,可能有一个更好的替代方案,它基本上是相同的思路,但更加完善和精细。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接