如何使用Tesseract进行图片识别（OCR）

Question

如何使用Tesseract进行图片识别（OCR）

4

我正在学习OpenCV和Tesseract，遇到了一个似乎非常简单的例子却遇到了麻烦。

这里有一张我正在尝试OCR的图片，上面写着"171 m":

我进行了一些预处理。由于文本的主色调是蓝色，所以我提取了蓝色通道并应用了简单的阈值处理。

img = cv2.imread('171_m.png')[y, x, 0]
_, thresh = cv2.threshold(img, 150, 255, cv2.THRESH_BINARY_INV)

生成的图像如下所示：

然后将其投入到Tesseract中，使用psm 7进行单行识别：

text = pytesseract.image_to_string(thresh, config='--psm 7')
print(text)
>>> lim

我还尝试了限制可能的字符，情况有所改善，但并不完全。

text = pytesseract.image_to_string(thresh, config='--psm 7 -c tessedit_char_whitelist=1234567890m')
print(text)
>>> 17m

OpenCV v4.1.1.
Tesseract v5.0.0-alpha.20190708

任何帮助都很感激。

- Anton Babkin

3个回答

0

我认为您的图像不够清晰，因此我使用如何在Python OpenCV中增加图像对比度中描述的过程来先锐化您的图像，然后提取蓝色层并运行tesseract。

希望这有所帮助。

import cv2
import pytesseract 

img = cv2.imread('test.png') #test.png is your original image
s = 128
img = cv2.resize(img, (s,int(s/2)), 0, 0, cv2.INTER_AREA)

def apply_brightness_contrast(input_img, brightness = 0, contrast = 0):

    if brightness != 0:
        if brightness > 0:
            shadow = brightness
            highlight = 255
        else:
            shadow = 0
            highlight = 255 + brightness
        alpha_b = (highlight - shadow)/255
        gamma_b = shadow

        buf = cv2.addWeighted(input_img, alpha_b, input_img, 0, gamma_b)
    else:
        buf = input_img.copy()

    if contrast != 0:
        f = 131*(contrast + 127)/(127*(131-contrast))
        alpha_c = f
        gamma_c = 127*(1-f)

        buf = cv2.addWeighted(buf, alpha_c, buf, 0, gamma_c)

    return buf

out = apply_brightness_contrast(img,0,64)

b, g, r = cv2.split(out) #spliting and using just the blue

pytesseract.image_to_string(255-b, config='--psm 7 -c tessedit_char_whitelist=1234567890m') # the 255-b here because the image has black backgorund and white numbers, 255-b switches the colors

- b3rt0

使用OpenCV 4.1版本时，(s, s/2)需要改为(s, int(s/2))，否则会出现TypeError: integer argument expected, got float的错误。 - drec4s

谢谢您指出这一点，我会编辑答案。否则，在您的端上是否正常工作？您是否得到了正确的答案？ - b3rt0

0

免责声明：这不是一个解决方案，只是试图部分解决此问题。

此过程仅在您事先知道图像中存在的字符数量的情况下才有效。以下是试验代码：

img0 = cv2.imread('171_m.png', 0)
adap_thresh = cv2.adaptiveThreshold(img0, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
text_adth = pytesseract.image_to_string(adap_thresh, config='--psm 7')

经过自适应阈值处理后，生成的图像如下：

Pytesseract 的输出结果为：

171 mi.

现在，如果您事先知道存在的字符数，您可以通过切片 pytesseract 读取的字符串并获得所需的输出 '171m'。

- Arkistarvh Kltzuonstev

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- nathancy · Accepted Answer

在将图像传入Pytesseract之前，预处理可以起到帮助作用。所需文本应为黑色，而背景应为白色。以下是一种方法：

将图像转换为灰度并放大图像
高斯模糊
Otsu阈值
反转图像

将图像转换为灰度后，我们使用imutils.resize()和高斯模糊对其进行放大处理。从这里，我们使用Otsu阈值得到二进制图像。

如果您的图像存在噪点，可以使用形态学操作来平滑或去除噪声。但是，由于您的图像已经足够清晰，我们可以简单地反转图像以获得结果。

使用--psm 6的Pytesseract输出

171m

import cv2
import pytesseract
import imutils

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.png',0)
image = imutils.resize(image, width=400)
blur = cv2.GaussianBlur(image, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
result = 255 - thresh 

data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()