使用OCR引擎tesseract提取文档时无法理解坐标。

Question

使用OCR引擎tesseract提取文档时无法理解坐标。

ocrtesseracttext-extractionhocr

7

我从tesseract中提取了一个图像文档，并成功地进行了提取。但是我无法理解提取文档的坐标。

问题描述：-

它显示坐标，但请告诉我这些坐标是否表示像素或其他内容。这些坐标有四个，类似于 title="bbox 10 13 43 46"，那么10、13、43和46是什么意思？它们代表什么位置？

提取后的完整代码

   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>
</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name='ocr-system' content='tesseract'/>
</head>
<body>
<div class='ocr_page' id='page_1' title='image "D:\ABC.tif"; bbox 0 0 464 101'>
    <div class='ocr_carea' id='block_1_1' title="bbox 10 13 330 55">
    <p 1class='ocr_par'>
        <span class='ocr_line' id='line_1_1' title="bbox 10 13 330 55">
            <span class='ocr_word' id='word_1_1' title="bbox 10 13 43 46">
                <span class='ocrx_word' id='xword_1_1' title="x_wconf -1"><strong>hi</strong></span>
            </span> 
            <span class='ocr_word' id='word_1_2' title="bbox 148 13 268 47">
                <span class='ocrx_word' id='xword_1_2' title="x_wconf -1"><strong>whats</strong></span>
            </span> 
            <span class='ocr_word' id='word_1_3' title="bbox 283 22 330 55">
                <span class='ocrx_word' id='xword_1_3' title="x_wconf -1"><strong>up</strong></span>
            </span>
        </span>
    </p>
    </div>
</div>
</body>
</html>

- user2326687

你能展示你输入的图片吗？ - jambono

3个回答

7

也许这对未来有所帮助。我认为图片本身就说明了问题。你可以根据这些值计算高度或顶部距离（例如，高度= y1-y0）。

- hepifish

1

除了y轴是反向的（像大多数图形应用程序一样），https://github.com/kba/hocr-spec/issues/34#issuecomment-252418295 - Waylon Flinn

4

这些数字应该显示一个包含一个单词的矩形框（一个矩形）的角落位置。这就是hocr协议。根据您的文档，Tesseract识别了句子“hi whats up”。

- jambono

请告诉我这些单词的位置。 - user2326687

它们是用左、上、右、下的像素位置表示吗？ - user2326687

我给了你一个链接，但是你没有使用它。请在维基百科上查看第一个链接。 - jambono

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- AbdulMueed · Accepted Answer

对于任何仍然不知道坐标系统如何工作的人，我终于找到了答案，它是这样的

10 13 43 46 startx、starty、endx、endy

如果你想找到单词的宽度和高度，那么

宽度 = endx - startx，高度 = endy - starty

使用“空格”将字符串拆分，然后去掉bbox，就可以了..