使用Tesseract检测亮色背景上的白色文本

Question

使用Tesseract检测亮色背景上的白色文本

javaopencvtesseracttess4jpokemon-go

3

我在阅读亮色背景上的白色文本时遇到了问题，尽管它能找到文本本身，但无法正确翻译。

图片：

我一直得到的结果是LanEerus，实际上并不太离谱。

我想知道的是，有什么图像预处理可以解决这个问题吗？我使用Photoshop手动进行预处理，然后再试着用代码实现，先找出应该有效的方法。

我尝试将其变成位图，但这使得文本的边界非常糟糕，导致tesseract只将其翻译为随机字符。

反转颜色和/或灰度化似乎也行不通。

有人有想法吗？我知道这对于这种情况来说是一个相当糟糕的背景。相信我，我希望背景不同！

我的测试代码：

File file = new File("C:\\tess\\lando.png");
ITesseract tess = new Tesseract();
tess.setDatapath("tessdata");

System.out.println(tess.doOCR(file));

编辑
我已经阅读了提高质量，但无法使这些技巧起作用。

编辑2
在使用OpenCV对图像进行灰度化、反转颜色、高斯模糊和自适应阈值处理后，我得到了这个图像的结果，但是没有更好的识别效果。如果说有什么变化的话，那反而更差了。

- Jonathan Öhrström

尝试使用Canny边缘检测 -> 然后进行OCR。 - Onkar Chougule

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- stateMachine · Accepted Answer

以下是一种可能的解决方案。这是Python代码，但是对于Java移植应该足够清晰。我们将应用一种称为“增益分割”的方法。思路是尝试构建背景模型，然后通过该模型对每个输入像素进行加权。输出增益应在大部分图像中保持相对恒定。这将消除大部分背景颜色变化。我们可以使用形态学链稍微清理结果，下面是代码:

# imports:
import cv2
import numpy as np
# OCR imports:
from PIL import Image
import pyocr
import pyocr.builders

# image path
path = "D://opencvImages//"
fileName = "c552h.png"

# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)

# Get local maximum:
kernelSize = 5
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(inputImage, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)

# Perform gain division
gainDivision = np.where(localMax == 0, 0, (inputImage/localMax))

# Clip the values to [0,255]
gainDivision = np.clip((255 * gainDivision), 0, 255)

# Convert the mat type from float to uint8:
gainDivision = gainDivision.astype("uint8")

第一步是应用增益分割，您需要的操作很简单：使用一个大矩形结构元素进行形态学闭运算和一些数据类型转换，要小心后者。此方法应用后应看到以下图像：

非常酷，背景几乎消失了。现在让我们使用Otsu阈值法得到二值图像：

# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(gainDivision, cv2.COLOR_BGR2GRAY)

# Get binary image via Otsu:
_, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

这是二值图像：

我们有一张很好的文本边缘图像。如果我们用白色填充背景，我们可以得到黑色背景和白色文本。但是，我们必须注意字符，因为如果一个字符被打断了，Flood-Fill操作会将其擦除。让我们先通过应用形态学的closing确保我们的字符是闭合的：

# Set kernel (structuring element) size:
kernelSize = 3
# Set morph operation iterations:
opIterations = 1

# Get the structuring element:
morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))

# Perform closing:
binaryImage = cv2.morphologyEx( binaryImage, cv2.MORPH_CLOSE, morphKernel, None, None, opIterations, cv2.BORDER_REFLECT101 )

这是生成的图像：

如您所见，边缘更加明显，最重要的是它们是封闭的。现在，我们可以用白色填充背景。这里，Flood-Fill 种子点位于图像原点（x = 0，y = 0）：

# Flood fill (white + black):
cv2.floodFill(binaryImage, mask=None, seedPoint=(int(0), int(0)), newVal=(255))

我们得到这张图片:

我们已经接近成功了。您可以看到，一些字符内部的孔(例如"a"、"d"和"o")没有填充 - 这可能会产生干扰OCR。让我们来试着填充它们。我们可以利用这些孔都是父轮廓的子轮廓这个事实。我们可以隔离子轮廓，并再次应用Flood-Fill来填充它们。但首先，不要忘记反转图像：

# Invert image so target blobs are colored in white:
binaryImage = 255 - binaryImage

# Find the blobs on the binary image:
contours, hierarchy = cv2.findContours(binaryImage, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

# Process the contours:
for i, c in enumerate(contours):

    # Get contour hierarchy:
    currentHierarchy = hierarchy[0][i][3]

    # Look only for children contours (the holes):
    if currentHierarchy != -1:

        # Get the contour bounding rectangle:
        boundRect = cv2.boundingRect(c)

        # Get the dimensions of the bounding rect:
        rectX = boundRect[0]
        rectY = boundRect[1]
        rectWidth = boundRect[2]
        rectHeight = boundRect[3]

        # Get the center of the contour the will act as
        # seed point to the Flood-Filling:
        fx = rectX + 0.5 * rectWidth
        fy = rectY + 0.5 * rectHeight

        # Fill the hole:
        cv2.floodFill(binaryImage, mask=None, seedPoint=(int(fx), int(fy)), newVal=(0))

# Write result to disk:
cv2.imwrite("text.png", binaryImage, [cv2.IMWRITE_PNG_COMPRESSION, 0])

这是生成的掩码:

好的，让我们应用 OCR。我使用的是 pyocr:

txt = tool.image_to_string(
    Image.open("text.png"),
    lang=lang,
    builder=pyocr.builders.TextBuilder()
)

print(txt)

输出：

Landorus