如何在Swift中将UIImage的像素值归一化?

3

我们尝试规范化一个UIImage,以便它可以正确地传递到CoreML模型中。

我们从每个像素中检索RGB值的方式是首先初始化一个名为rawData[CGFloat]数组,该数组包含每个像素的值,使得颜色红、绿、蓝和 alpha 值都有一个位置。在bitmapInfo中,我们从原始UIimage本身获取原始像素值并进行处理。这用于填充context参数,它是一个CGContext变量。稍后,我们将使用context变量来draw一个CGImage,后者将把规范化的CGImage转换回一个UIImage

使用嵌套的循环迭代xy坐标,找到所有像素中所有颜色的最小和最大像素颜色值(通过CGFloat的原始数据数组找到)。设置bound变量以终止for循环,否则将出现超出范围的错误。

range指示可能的 RGB 值范围(即最大颜色值和最小颜色值之间的差)。

使用规范化每个像素值的方程:

A = Image
curPixel = current pixel (R,G, B or Alpha) 
NormalizedPixel = (curPixel-minPixel(A))/range

我们需要一个类似于上面的嵌套循环来遍历rawData数组,并根据此规范修改每个像素的颜色。

我们的大部分代码都来自于:

  1. UIImage to UIColor array of pixel colors
  2. Change color of certain pixels in a UIImage
  3. https://gist.github.com/pimpapare/e8187d82a3976b851fc12fe4f8965789

我们使用CGFloat而不是UInt8,因为规范化的像素值应该是介于0和1之间的实数,而不是0或1。

func normalize() -> UIImage?{

    let colorSpace = CGColorSpaceCreateDeviceRGB()

    guard let cgImage = cgImage else {
        return nil
    }

    let width = Int(size.width)
    let height = Int(size.height)

    var rawData = [CGFloat](repeating: 0, count: width * height * 4)
    let bytesPerPixel = 4
    let bytesPerRow = bytesPerPixel * width
    let bytesPerComponent = 8

    let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.byteOrder32Big.rawValue & CGBitmapInfo.alphaInfoMask.rawValue

    let context = CGContext(data: &rawData,
                            width: width,
                            height: height,
                            bitsPerComponent: bytesPerComponent,
                            bytesPerRow: bytesPerRow,
                            space: colorSpace,
                            bitmapInfo: bitmapInfo)

    let drawingRect = CGRect(origin: .zero, size: CGSize(width: width, height: height))
    context?.draw(cgImage, in: drawingRect)

    let bound = rawData.count

    //find minimum and maximum
    var minPixel: CGFloat = 1.0
    var maxPixel: CGFloat = 0.0

    for x in 0..<width {
        for y in 0..<height {

            let byteIndex = (bytesPerRow * x) + y * bytesPerPixel

            if(byteIndex > bound - 4){
                break
            }
            minPixel = min(CGFloat(rawData[byteIndex]), minPixel)
            minPixel = min(CGFloat(rawData[byteIndex + 1]), minPixel)
            minPixel = min(CGFloat(rawData[byteIndex + 2]), minPixel)

            minPixel = min(CGFloat(rawData[byteIndex + 3]), minPixel)


            maxPixel = max(CGFloat(rawData[byteIndex]), maxPixel)
            maxPixel = max(CGFloat(rawData[byteIndex + 1]), maxPixel)
            maxPixel = max(CGFloat(rawData[byteIndex + 2]), maxPixel)

            maxPixel = max(CGFloat(rawData[byteIndex + 3]), maxPixel)
        }
    }

    let range = maxPixel - minPixel
    print("minPixel: \(minPixel)")
    print("maxPixel : \(maxPixel)")
    print("range: \(range)")

    for x in 0..<width {
        for y in 0..<height {
            let byteIndex = (bytesPerRow * x) + y * bytesPerPixel

            if(byteIndex > bound - 4){
                break
            }
            rawData[byteIndex] = (CGFloat(rawData[byteIndex]) - minPixel) / range
            rawData[byteIndex+1] = (CGFloat(rawData[byteIndex+1]) - minPixel) / range
            rawData[byteIndex+2] = (CGFloat(rawData[byteIndex+2]) - minPixel) / range

            rawData[byteIndex+3] = (CGFloat(rawData[byteIndex+3]) - minPixel) / range

        }
    }

    let cgImage0 = context!.makeImage()
    return UIImage.init(cgImage: cgImage0!)
}

在规范化之前,我们期望像素值的范围为0-255,在规范化后,像素值的范围是0-1。

规范化公式能够将像素值规范化为0到1之间的值。但是当我们尝试打印出(仅在遍历像素值时添加打印语句)规范化之前的像素值以验证是否正确获取原始像素值时,我们发现这些值的范围不正确。例如,某个像素值的值为3.506e+305(大于255)。我们认为我们一开始就错误地获得了原始像素值。

我们不熟悉Swift中的图像处理,也不确定整个规范化过程是否正确。任何帮助都将不胜感激!


minPixel 应该是整个 rawData 数组中最小的值,还是仅在当前像素之前的像素中最小的值?并且它应该是所有 4 个通道中最小的值吗?还是取决于通道? - ielyamani
@ielyamani - 不,他只想调整颜色通道,而不是 Alpha 值。例如,如果颜色范围在 31-63(UInt8 中),但 Alpha 值为 255,则仍然希望根据最大值 63 来调整颜色,而不是 255。 - Rob
@Rob rawData[byteIndex+3] = ... 不会更新 alpha 通道吗? - ielyamani
@ielyamani - 是的,但我相信那是一个错误。如果他真的想要归一化颜色和透明度值,他会想要为每个值保持单独的最小和最大范围。但我敢打赌他只关心颜色范围。 - Rob
2个回答

3
一些观察:
  1. Your rawData is floating point, CGFloat, array, but your context isn’t populating it with floating point data, but rather with UInt8 data. If you want a floating point buffer, build a floating point context with CGBitmapInfo.floatComponents and tweak the context parameters accordingly. E.g.:

    func normalize() -> UIImage? {
        let colorSpace = CGColorSpaceCreateDeviceRGB()
    
        guard let cgImage = cgImage else {
            return nil
        }
    
        let width = cgImage.width
        let height = cgImage.height
    
        var rawData = [Float](repeating: 0, count: width * height * 4)
        let bytesPerPixel = 16
        let bytesPerRow = bytesPerPixel * width
        let bitsPerComponent = 32
    
        let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.floatComponents.rawValue | CGBitmapInfo.byteOrder32Little.rawValue
    
        guard let context = CGContext(data: &rawData,
                                      width: width,
                                      height: height,
                                      bitsPerComponent: bitsPerComponent,
                                      bytesPerRow: bytesPerRow,
                                      space: colorSpace,
                                      bitmapInfo: bitmapInfo) else { return nil }
    
        let drawingRect = CGRect(origin: .zero, size: CGSize(width: width, height: height))
        context.draw(cgImage, in: drawingRect)
    
        var maxValue: Float = 0
        var minValue: Float = 1
    
        for pixel in 0 ..< width * height {
            let baseOffset = pixel * 4
            for offset in baseOffset ..< baseOffset + 3 {
                let value = rawData[offset]
                if value > maxValue { maxValue = value }
                if value < minValue { minValue = value }
            }
        }
        let range = maxValue - minValue
        guard range > 0 else { return nil }
    
        for pixel in 0 ..< width * height {
            let baseOffset = pixel * 4
            for offset in baseOffset ..< baseOffset + 3 {
                rawData[offset] = (rawData[offset] - minValue) / range
            }
        }
    
        return context.makeImage().map { UIImage(cgImage: $0, scale: scale, orientation: imageOrientation) }
    }
    
  2. But this begs the question of why you’d bother with floating point data. If you were returning this floating point data back to your ML model, then I can imagine it might be useful, but you’re just creating a new image. Because of that, you also have to opportunity to just retrieve the UInt8 data, do the floating point math, and then update the UInt8 buffer, and create the image from that. Thus:

    func normalize() -> UIImage? {
        let colorSpace = CGColorSpaceCreateDeviceRGB()
    
        guard let cgImage = cgImage else {
            return nil
        }
    
        let width = cgImage.width
        let height = cgImage.height
    
        var rawData = [UInt8](repeating: 0, count: width * height * 4)
        let bytesPerPixel = 4
        let bytesPerRow = bytesPerPixel * width
        let bitsPerComponent = 8
    
        let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue
    
        guard let context = CGContext(data: &rawData,
                                      width: width,
                                      height: height,
                                      bitsPerComponent: bitsPerComponent,
                                      bytesPerRow: bytesPerRow,
                                      space: colorSpace,
                                      bitmapInfo: bitmapInfo) else { return nil }
    
        let drawingRect = CGRect(origin: .zero, size: CGSize(width: width, height: height))
        context.draw(cgImage, in: drawingRect)
    
        var maxValue: UInt8 = 0
        var minValue: UInt8 = 255
    
        for pixel in 0 ..< width * height {
            let baseOffset = pixel * 4
            for offset in baseOffset ..< baseOffset + 3 {
                let value = rawData[offset]
                if value > maxValue { maxValue = value }
                if value < minValue { minValue = value }
            }
        }
        let range = Float(maxValue - minValue)
        guard range > 0 else { return nil }
    
        for pixel in 0 ..< width * height {
            let baseOffset = pixel * 4
            for offset in baseOffset ..< baseOffset + 3 {
                rawData[offset] = UInt8(Float(rawData[offset] - minValue) / range * 255)
            }
        }
    
        return context.makeImage().map { UIImage(cgImage: $0, scale: scale, orientation: imageOrientation) }
    }
    

    I just depends upon whether you really needed this floating point buffer for your ML model (in which case, you might return the array of floats in the first example, rather than creating a new image) or whether the goal was just to create the normalized UIImage.

    I benchmarked this, and it was a tad faster on iPhone XS Max than the floating point rendition, but takes a quarter of the the memory (e.g. a 2000×2000px image takes 16mb with UInt8, but 64mb with Float).

  3. Finally, I should mention that vImage has a highly optimized function, vImageContrastStretch_ARGB8888 that does something very similar to what we’ve done above. Just import Accelerate and then you can do something like:

    func normalize3() -> UIImage? {
        let colorSpace = CGColorSpaceCreateDeviceRGB()
    
        guard let cgImage = cgImage else { return nil }
    
        var format = vImage_CGImageFormat(bitsPerComponent: UInt32(cgImage.bitsPerComponent),
                                          bitsPerPixel: UInt32(cgImage.bitsPerPixel),
                                          colorSpace: Unmanaged.passRetained(colorSpace),
                                          bitmapInfo: cgImage.bitmapInfo,
                                          version: 0,
                                          decode: nil,
                                          renderingIntent: cgImage.renderingIntent)
    
        var source = vImage_Buffer()
        var result = vImageBuffer_InitWithCGImage(
            &source,
            &format,
            nil,
            cgImage,
            vImage_Flags(kvImageNoFlags))
    
        guard result == kvImageNoError else { return nil }
    
        defer { free(source.data) }
    
        var destination = vImage_Buffer()
        result = vImageBuffer_Init(
            &destination,
            vImagePixelCount(cgImage.height),
            vImagePixelCount(cgImage.width),
            32,
            vImage_Flags(kvImageNoFlags))
    
        guard result == kvImageNoError else { return nil }
    
        result = vImageContrastStretch_ARGB8888(&source, &destination, vImage_Flags(kvImageNoFlags))
        guard result == kvImageNoError else { return nil }
    
        defer { free(destination.data) }
    
        return vImageCreateCGImageFromBuffer(&destination, &format, nil, nil, vImage_Flags(kvImageNoFlags), nil).map {
            UIImage(cgImage: $0.takeRetainedValue(), scale: scale, orientation: imageOrientation)
        }
    }
    

    While this employs a slightly different algorithm, it’s worth considering, because in my benchmarking, on my iPhone XS Max it was over 5 times as fast as the floating point rendition.


以下是几点无关观察:

  1. 你的代码片段也在规范化 alpha 通道。我不确定你是否想要这样做。通常颜色和 alpha 通道是独立的。上面我假设你只想规范化颜色通道。如果你还想规范化alpha通道,那么你可能需要针对alpha通道有一个单独的最小-最大值范围,并将其分别处理。但是使用与颜色通道相同的值范围来规范化alpha通道(或反之亦然)并没有多少意义。

  2. 我使用 CGImage 的尺寸而不是 UIImage 的尺寸。如果你的图片可能不是1的比例,这一点很重要。

  3. 如果范围已经是0-255(即不需要规范化),可以考虑提前退出。


guard range > 0 else { return nil } 中返回 nil 可能过于严格:图像可能只是一个纯灰色。我认为在这里返回可选图像是不必要的。 - ielyamani
如果图像是纯灰色(或任何范围为零的纯色),那么应该将其归一化为零吗?到255?到其他某个神奇的标志值?在我看来,对于纯色单一的图像,归一化颜色值的整个想法是没有意义的。在上述情况下,我可能会考虑其他优化,但我个人认为让调用者知道它无法被归一化是有用的(例如,如果它加载了某些资源,则知道不需要将归一化的图像保存为新资源可能很有用)。 - Rob
也许这是主观的,但在纯色情况下,我会选择返回原始图像。无论如何,这取决于OP和他们对标准化图像的期望。 - ielyamani

0
可能有一种更好的方法来进行归一化,那就是在将PyTorch或TensorFlow模型转换为coreml时通过coreml模型本身完成。这是在使用coreml工具进行模型转换时完成的,当指定输入类型时,可以指定一个比例(以及偏差)因子来缩放输入图像。
import coremltools as ct
input_shape = (1, 3, 256, 256)
# Set the image scale and bias for input image preprocessing
scale = 1/(0.226*255.0)
bias = [- 0.485/(0.229) , - 0.456/(0.224), - 0.406/(0.225)]

image_input = ct.ImageType(name="input_1",
                           shape=nput_shape,
                           scale=scale, bias=bias,
                           color_layout=ct.colorlayout.RGB). 

coreml工具网站上有更多信息。如果您的模型来自于非coreml转换的其他方式,那么这种方法就不适用于您。然而,对于大多数情况,我们会在PyTorch或TF中训练模型,并在iPhone上运行推理,这是一个比在Swift中使用CVPixelBuffer进行操作更合理的路径。希望能对您有所帮助!


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接