聚类边界框并在其上绘制线条（OpenCV，Python）

Question

聚类边界框并在其上绘制线条（OpenCV，Python）

4

使用这段代码，我在下面的图像中创建了一些字符周围的边界框：

import csv
import cv2
from pytesseract import pytesseract as pt

pt.run_tesseract('bb.png', 'output', lang=None, boxes=True, config="hocr")

# To read the coordinates
boxes = []
with open('output.box', 'rt') as f:
    reader = csv.reader(f, delimiter=' ')
    for row in reader:
        if len(row) == 6:
            boxes.append(row)

# Draw the bounding box
img = cv2.imread('bb.png')
h, w, _ = img.shape
for b in boxes:
    img = cv2.rectangle(img, (int(b[1]), h-int(b[2])), (int(b[3]), h-int(b[4])), (0, 255, 0), 2)

cv2.imshow('output', img)
cv2.waitKey(0)

输出

我想要的是这个：

程序应该在边界框的X轴上绘制一条垂线（仅针对第一个和第三个文本区域。中间的不需要参与过程）。

目标是这样的（如果有另一种方法实现它，请解释）：一旦我有了这两条线（或更好的是，坐标组），使用掩码来覆盖这两个区域。

是否可能？

源图像：

按要求的CSV打印框：

[['l', '56', '328', '63', '365', '0'], ['i', '69', '328', '76', '365', '0'], ['n', '81', '328', '104', '354', '0'], ['e', '108', '328', '130', '354', '0'], ['1', '147', '328', '161', '362', '0'], ['m', '102', '193', '151', '227', '0'], ['i', '158', '193', '167', '242', '0'], ['d', '173', '192', '204', '242', '0'], ['d', '209', '192', '240', '242', '0'], ['l', '247', '193', '256', '242', '0'], ['e', '262', '192', '292', '227', '0'], ['t', '310', '192', '331', '235', '0'], ['e', '334', '192', '364', '227', '0'], ['x', '367', '193', '398', '227', '0'], ['t', '399', '192', '420', '235', '0'], ['-', '440', '209', '458', '216', '0'], ['n', '481', '193', '511', '227', '0'], ['o', '516', '192', '548', '227', '0'], ['n', '553', '193', '583', '227', '0'], ['t', '602', '192', '623', '235', '0'], ['o', '626', '192', '658', '227', '0'], ['t', '676', '192', '697', '235', '0'], ['o', '700', '192', '732', '227', '0'], ['u', '737', '192', '767', '227', '0'], ['c', '772', '192', '802', '227', '0'], ['h', '806', '193', '836', '242', '0'], ['l', '597', '49', '604', '86', '0'], ['i', '610', '49', '617', '86', '0'], ['n', '622', '49', '645', '75', '0'], ['e', '649', '49', '671', '75', '0'], ['2', '686', '49', '710', '83', '0']]

编辑：

要使用zindarod的答案，您需要安装tesserocr。通过pip install tesserocr安装可能会出现各种错误。经过数小时的尝试安装和解决错误后，我找到了它的wheel版本（请参见下面答案中的我的评论...）：在这里可以找到/下载它。

希望这可以帮助您...

- lucians

我建议您对边界框进行聚类，然后获取第一行聚类中的最大 y 值和第二行聚类中的最小 y 值，然后使用这两个 y 值和所有宽度创建矩形以生成掩码。 - api55

看起来没问题。你知道怎么做吗？另外，我还发现了一个关键词： "连通组件标记"。 - lucians

连通组件不行。如果它们都以某种方式连接，则可以使用此方法。但是，您可以使用其y值和k = 3的k-means算法。然后，根据它们的y值，您将拥有3个字母簇。kmeans已在opencv中实现。 - api55

1

找到盒子后，每个盒子都有两个y坐标（顶部和底部），您可以对它们取平均值，以获得每个字母的一个y值。这将是一个数组，您将其传递给kmeans，然后kmeans将标记每个值（来自每个字母的每个y）为1,2,3（不确定是否为0,1,2）。现在，您可以将每组字母放入一个盒子中。从那里，您可以获取创建掩码所需的值...我可以写一个完整的答案，但需要几个小时。您能发布CSV和初始图像吗？以便测试它。 - api55

1

请看这里。 - Miki

显示剩余3条评论

2个回答

0

我在这里迟到了，本来是在寻找其他东西。我从未使用过tesser包装器，它们似乎只是在没有实际好处的情况下妨碍了我的工作。它们所做的就是抽象化对子进程的调用？

这是我通过传递给子进程的参数访问psm配置的方式。为了完整起见，我还包括了oem、pdf和hocr参数，但这并不是必要的，你可以只传递psm参数。请在终端上进行帮助调用，因为有13个psm选项和4个oem选项。根据你所做的事情，质量可能高度依赖于psm。

使用subprocess.Popen()可以进行管道输入和输出，如果你感到有冒险精神，也可以使用asyncio.create_subprocess_exec()以类似的方式异步执行。

import subprocess

# args 
# 'tesseract' - the executable name
# path to the image file
# output file name - no extension tesser will add .txt .pdf .hocr etc etc
# optional params
# -psm x to set the page segmentation mode see more with tesseract --help-psm at the cli
# -oem x to set ocr engine mode see more with tesseract --help-osm
# can add a mode parameter to the end of the args list to get output in :
# searchable pdf - just add a parameter 'pdf' as below
# hOCR output (html) - just add 'hocr' as below

args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2']

# args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2', 'pdf']
# args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2', 'hocr']

try:
    proc = subprocess.check_call(args)
    print('subprocess retcode {r}'.format(r=proc))
except subprocess.CalledProcessError as exp:
    print('subprocess.CalledProcessError : ', exp)

- Chanonry

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- zindarod · Accepted Answer

谷歌的tesseract-ocr已经在页面分割方法(psm)中拥有了这个功能。你只需要使用一个更好的Python封装器，它比 pytesseract 暴露了更多的tesseract功能之一是tesserocr。

下面是一个简单的例子：

import cv2 import numpy as np import tesserocr as tr from PIL import Image cv_img = cv2.imread('text.png', cv2.IMREAD_UNCHANGED) # since tesserocr accepts PIL images, converting opencv image to pil pil_img = Image.fromarray(cv2.cvtColor(cv_img,cv2.COLOR_BGR2RGB)) #initialize api api = tr.PyTessBaseAPI() try: # set pil image for ocr api.SetImage(pil_img) # Google tesseract-ocr has a page segmentation methos(psm) option for specifying ocr types # psm values can be: block of text, single text line, single word, single character etc. # api.GetComponentImages method exposes this functionality # function returns: # image (:class:`PIL.Image`): Image object. # bounding box (dict): dict with x, y, w, h keys. # block id (int): textline block id (if blockids is ``True``). ``None`` otherwise. # paragraph id (int): textline paragraph id within its block (if paraids is True). # ``None`` otherwise. boxes = api.GetComponentImages(tr.RIL.TEXTLINE,True) # get text text = api.GetUTF8Text() # iterate over returned list, draw rectangles for (im,box,_,_) in boxes: x,y,w,h = box['x'],box['y'],box['w'],box['h'] cv2.rectangle(cv_img, (x,y), (x+w,y+h), color=(0,0,255)) finally: api.End() cv2.imshow('output', cv_img) cv2.waitKey(0) cv2.destroyAllWindows()