我使用Tesseract和Python来读取数字(从能源计量表中)。除了数字“1”之外,一切都正常。Tesseract无法读取数字“1”。这是我发送给Tesseract的图片:。而Tesseract读取为“0000027”。我该如何告诉Tesseract竖杠是“1”?这是我的Tesseract初始化:
import tesseract
TESSERACT_LIBRARY_PATH = "C:\\Program Files (x86)\\Tesseract-OCR"
LANGUAGE = "eng"
CHARACTERS = "0123456789"
FALSE = "0"
TRUE = "1"
def init_ocr():
"""
.. py:function:: init_ocr()
Utilize the Tesseract-OCR library to create an tesseract_ocr that
predicts the numbers to be read off of the meter.
:return: tesseract_ocr Tesseracts OCR API.
:rtype: Class
"""
# Initialize the tesseract_ocr with the english language package.
tesseract_ocr = tesseract.TessBaseAPI()
tesseract_ocr.Init(TESSERACT_LIBRARY_PATH, LANGUAGE,
tesseract.OEM_DEFAULT)
# Limit the characters being seached for to numerics.
tesseract_ocr.SetVariable("tessedit_char_whitelist", CHARACTERS)
# Set the tesseract_ocr to predict for only one character.
tesseract_ocr.SetPageSegMode(tesseract.PSM_AUTO)
# Tesseract's Directed Acyclic Graph.
# Not necessary for number recognition.
tesseract_ocr.SetVariable("load_system_dawg", FALSE)
tesseract_ocr.SetVariable("load_freq_dawg", FALSE)
tesseract_ocr.SetVariable("load_number_dawg", TRUE)
tesseract_ocr.SetVariable("classify_enable_learning", FALSE)
tesseract_ocr.SetVariable("classify_enable_adaptive_matcher", FALSE)
return tesseract_ocr