OpenCV和pytesseract用于OCR

Question

OpenCV和pytesseract用于OCR

3

如何使用OpenCV和Pytesseract从图像中提取文本？

import cv2

导入pytesseract库从PIL库中导入Image模块导入numpy库的np模块从matplotlib库中导入pyplot模块

img = Image.open('test.jpg').convert('L')
img.show()
img.save('test','png')
img = cv2.imread('test.png',0)
edges = cv2.Canny(img,100,200)
#contour = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#print pytesseract.image_to_string(Image.open(edges))
print pytesseract.image_to_string(edges)

但是这会出现错误-

回溯（最近的调用在最上面）：文件“open.py”，第14行，在 print pytesseract.image_to_string(edges) 文件“/home/sroy8091/.local/lib/python2.7/site-packages/pytesseract/pytesseract.py”，第143行，在image_to_string if len(image.split()) == 4: 属性错误：'NoneType'对象没有属性'split'

- sumitroy

2个回答

0

你不能直接使用OpenCV对象与Tesseract方法。

尝试：

from PIL import Image
from pytesseract import *

image_file = 'test.png'
print(pytesseract.image_to_string(Image.open(image_file)))

- Paul Alexandru Pop

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Deepan Raj · Accepted Answer

如果您想在opencv中进行一些预处理（例如进行了一些边缘检测），然后如果您想提取文本，您可以使用以下命令：

# All the imports and other stuffs goes here
img = cv2.imread('test.png',0)
edges = cv2.Canny(img,100,200)
img_new = Image.fromarray(edges)
text = pytesseract.image_to_string(img_new, lang='eng')
print (text)