使用pytesser识别简单数字

3

我正在学习使用PyTesserTesseract进行OCR。作为第一个里程碑,我想编写一个工具来识别仅由一些数字组成的验证码。我阅读了一些教程并编写了这样的测试程序。

from pytesser.pytesser import *
from PIL import Image, ImageFilter, ImageEnhance

im = Image.open("test.tiff")
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
text = image_to_string(im)
print "text={}".format(text)

I tested my code with the image below. But the result is 2(T?770. And I've tested some other similar images as well, in 80% case the results are incorrect.

enter image description here

I'm not familiar with imaging processing. I've two questions here:

  1. Is it possible to tell PyTesser to guess digits only?

  2. I think the image is quite easy for human to read. If it is so difficult for PyTesser to read digits only image, is there any alternatives can do a better OCR?

Any hints are very appreciated.

1个回答

1
我认为你的代码还不错。它能够识别207770。问题在于pytesser的安装。pytesser中的Tesseract已经过时。需要下载最新版本并覆盖相应文件。你还需要编辑pytesser.py并做出修改。
tesseract_exe_name = 'tesseract'

import os.path
tesseract_exe_name = os.path.join(os.path.dirname(__file__), 'tesseract')

你是怎么知道 pytesser 中的 Tesseract 已经过时了? - DanGoodrick

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接