iOS Tesseract OCR 图像准备

Question

iOS Tesseract OCR 图像准备

iosimage-processingocrtesseract

16

我希望实现一个OCR应用程序，能够从照片中识别文本。

我成功地在iOS上编译和集成了Tesseract引擎，当拍摄清晰的文件（或屏幕上这段文字的照片）时，我成功地获得了合理的检测结果，但对于其他文字，如路标、商店招牌、彩色背景，检测失败了。

问题是，为了获得更好的识别结果，需要进行哪些图像处理准备工作。例如，我认为我们需要将图像转换为灰度/B&W，并修复对比度等。

在iOS中如何完成这个任务？是否有相关的软件包可用？

- alandalusi

2个回答

10

我使用了上面的代码，并且添加了另外两个函数调用，以便将图像转换为适用于Tesseract的格式。

首先，我使用了一个图像大小调整脚本将其转换为640 x 640，这似乎更易于Tesseract处理。

-(UIImage *)resizeImage:(UIImage *)image {

    CGImageRef imageRef = [image CGImage];
    CGImageAlphaInfo alphaInfo = CGImageGetAlphaInfo(imageRef);
    CGColorSpaceRef colorSpaceInfo = CGColorSpaceCreateDeviceRGB();

    if (alphaInfo == kCGImageAlphaNone)
        alphaInfo = kCGImageAlphaNoneSkipLast;

    int width, height;

    width = 640;//[image size].width;
    height = 640;//[image size].height;

    CGContextRef bitmap;

    if (image.imageOrientation == UIImageOrientationUp | image.imageOrientation == UIImageOrientationDown) {
        bitmap = CGBitmapContextCreate(NULL, width, height, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, alphaInfo);

    } else {
        bitmap = CGBitmapContextCreate(NULL, height, width, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, alphaInfo);

    }

    if (image.imageOrientation == UIImageOrientationLeft) {
        NSLog(@"image orientation left");
        CGContextRotateCTM (bitmap, radians(90));
        CGContextTranslateCTM (bitmap, 0, -height);

    } else if (image.imageOrientation == UIImageOrientationRight) {
        NSLog(@"image orientation right");
        CGContextRotateCTM (bitmap, radians(-90));
        CGContextTranslateCTM (bitmap, -width, 0);

    } else if (image.imageOrientation == UIImageOrientationUp) {
        NSLog(@"image orientation up");

    } else if (image.imageOrientation == UIImageOrientationDown) {
        NSLog(@"image orientation down");
        CGContextTranslateCTM (bitmap, width,height);
        CGContextRotateCTM (bitmap, radians(-180.));

    }

    CGContextDrawImage(bitmap, CGRectMake(0, 0, width, height), imageRef);
    CGImageRef ref = CGBitmapContextCreateImage(bitmap);
    UIImage *result = [UIImage imageWithCGImage:ref];

    CGContextRelease(bitmap);
    CGImageRelease(ref);

    return result;
}

确保弧度函数正常工作，需要在@implementation之前声明它。

static inline double radians (double degrees) {return degrees * M_PI/180;}

然后我将其转换为灰度图像。

我在这篇文章中找到了将图像转换为灰度，学习如何将图像转换为灰度。

我已经成功地使用了来自此处的代码，并且现在可以读取不同颜色的文本和不同颜色的背景。

我稍微修改了代码以使其作为一个类中的函数而不是作为它自己的类，而另一个人则是这样做的。

- (UIImage *) toGrayscale:(UIImage*)img
{
    const int RED = 1;
    const int GREEN = 2;
    const int BLUE = 3;

    // Create image rectangle with current image width/height
    CGRect imageRect = CGRectMake(0, 0, img.size.width * img.scale, img.size.height * img.scale);

    int width = imageRect.size.width;
    int height = imageRect.size.height;

    // the pixels will be painted to this array
    uint32_t *pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));

    // clear the pixels so any transparency is preserved
    memset(pixels, 0, width * height * sizeof(uint32_t));

    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();

    // create a context with RGBA pixels
    CGContextRef context = CGBitmapContextCreate(pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,
                                                 kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);

    // paint the bitmap to our context which will fill in the pixels array
    CGContextDrawImage(context, CGRectMake(0, 0, width, height), [img CGImage]);

    for(int y = 0; y < height; y++) {
        for(int x = 0; x < width; x++) {
            uint8_t *rgbaPixel = (uint8_t *) &pixels[y * width + x];

            // convert to grayscale using recommended method:     http://en.wikipedia.org/wiki/Grayscale#Converting_color_to_grayscale
            uint32_t gray = 0.3 * rgbaPixel[RED] + 0.59 * rgbaPixel[GREEN] + 0.11 * rgbaPixel[BLUE];

            // set the pixels to gray
            rgbaPixel[RED] = gray;
            rgbaPixel[GREEN] = gray;
            rgbaPixel[BLUE] = gray;
        }
    }

    // create a new CGImageRef from our context with the modified pixels
    CGImageRef image = CGBitmapContextCreateImage(context);

    // we're done with the context, color space, and pixels
    CGContextRelease(context);
    CGColorSpaceRelease(colorSpace);
    free(pixels);

    // make a new UIImage to return
    UIImage *resultUIImage = [UIImage imageWithCGImage:image
                                             scale:img.scale
                                       orientation:UIImageOrientationUp];

    // we're done with image now too
    CGImageRelease(image);

    return resultUIImage;
}

- Adam Richardson

我一直在尝试这个，我的图像已经转换了，但是UIImage仍然在我的iPhone上崩溃。有什么建议吗？你能提供你的源代码吗？ - Tha Leang

1

你是从相机返回图像还是从其他来源加载它？另外，我提供的代码假定你正在使用ARC，如果不是，则需要在适当的时间释放图像和其他对象，否则由于内存负载而导致崩溃。 - Adam Richardson

image.imageOrientation == UIImageOrientationUp | image.imageOrientation == UIImageOrientationDown - pronebird

我正在尝试上述代码，但是出现了“未声明的标识符radians”的错误。 - Daniel P

1

@daniel-p 确保你已经包含了 math.h。然后在 viewController 的实现之前添加以下内容：static inline double radians (double degrees) {return degrees * M_PI/180;} - Adam Richardson

@Adam Richardson，我按照您的建议修改了代码，但仍然无法获得准确结果。 - Aruna kumari

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- roocell · Accepted Answer

我目前也在处理类似的事情。我发现用Photoshop保存的PNG图片可以正常使用，但是从相机导入应用程序的图片却无法使用。不要问我为什么 - 但是使用这个函数可以使这些图片正常工作。也许对你也有用。

// this does the trick to have tesseract accept the UIImage.
UIImage * gs_convert_image (UIImage * src_img) {
    CGColorSpaceRef d_colorSpace = CGColorSpaceCreateDeviceRGB();
    /*
     * Note we specify 4 bytes per pixel here even though we ignore the
     * alpha value; you can't specify 3 bytes per-pixel.
     */
    size_t d_bytesPerRow = src_img.size.width * 4;
    unsigned char * imgData = (unsigned char*)malloc(src_img.size.height*d_bytesPerRow);
    CGContextRef context =  CGBitmapContextCreate(imgData, src_img.size.width,
                                                  src_img.size.height,
                                                  8, d_bytesPerRow,
                                                  d_colorSpace,
                                                  kCGImageAlphaNoneSkipFirst);

    UIGraphicsPushContext(context);
    // These next two lines 'flip' the drawing so it doesn't appear upside-down.
    CGContextTranslateCTM(context, 0.0, src_img.size.height);
    CGContextScaleCTM(context, 1.0, -1.0);
    // Use UIImage's drawInRect: instead of the CGContextDrawImage function, otherwise you'll have issues when the source image is in portrait orientation.
    [src_img drawInRect:CGRectMake(0.0, 0.0, src_img.size.width, src_img.size.height)];
    UIGraphicsPopContext();

    /*
     * At this point, we have the raw ARGB pixel data in the imgData buffer, so
     * we can perform whatever image processing here.
     */


    // After we've processed the raw data, turn it back into a UIImage instance.
    CGImageRef new_img = CGBitmapContextCreateImage(context);
    UIImage * convertedImage = [[UIImage alloc] initWithCGImage:
                                 new_img];

    CGImageRelease(new_img);
    CGContextRelease(context);
    CGColorSpaceRelease(d_colorSpace);
    free(imgData);
    return convertedImage;
}

我也进行了许多实验来为tesseract准备图像。调整大小，转换为灰度图像，然后调整亮度和对比度似乎效果最佳。

我还尝试过这个GPUImage库：https://github.com/BradLarson/GPUImage。 GPUImageAverageLuminanceThresholdFilter似乎给我提供了一个很好的调整后的图像，但是tesseract似乎无法很好地处理它。

我还将opencv放入了我的项目中，并计划尝试它的图像例程。可能甚至会使用一些框检测来找到文本区域（我希望这会加速tesseract）。