基于文本方向检测图片的方向角

Question

基于文本方向检测图片的方向角

pythonimageopencvimage-processingcomputer-vision

10

我正在处理 OCR 任务，从多个身份证明文件中提取信息。其中一个挑战是扫描图像的方向。需要解决卡号、Aadhaar、驾驶执照或任何 ID 证明的扫描图像方向问题。

已尝试过 Stackoverflow 和其他论坛上建议的所有方法，例如 OpenCV minAreaRect、Hough Lines 变换、FFT、单应性变换、tesseract osd with psm 0 等，但都无法解决问题。

逻辑应返回文本方向的角度 - 0、90 和 270 度。附上了 0、90 和 270 度的图像。这不是关于确定斜度的问题。

- Ravi

1

直接的方法是对4个旋转图像应用光学字符识别，并保留包含单词“India”的特征，或者使用一些测试来获得分割字符串的最佳分数。可以考虑使用opencv、numpy、Image和pytesseract库来实现这一目标。您能否发布一个显示您尝试过的最小代码？ - francis

@francis，感谢您的评论和建议。由于评论的字符限制和为了简洁起见，我将代码片段作为单独的评论逐个发布在下面。出于某种原因，代码显示为纯文本。 - Ravi

这是使用pytesseract的代码，目的是忽略文本方向，并让tesseract隐式处理它，但效果不太好： config = ('stdout --psm 0 --oem 0 -l osd -c min_characters_to_try=5') imgPath = sys.argv[1] img = cv2.imread(imgPath) text = pytesseract.image_to_osd(img, config=config) print(text) - Ravi

这是HOG算法代码： im = cv2.imread(imgPath) im = np.float32(im) / 255.0 gx = cv2.Sobel(im, cv2.CV_32F, 1, 0, ksize=1) gy = cv2.Sobel(im, cv2.CV_32F, 0, 1, ksize=1) mag, angle = cv2.cartToPolar(gx, gy, angleInDegrees=True) print(angle[0]) - Ravi

使用霍夫线变换： img_edges = cv2.Canny(img_before, 100, 200, apertureSize=3) lines = cv2.HoughLinesP(img_edges, 1, math.pi / 180.0, 100, minLineLength=100, maxLineGap=5) angles = [] for x1, y1, x2, y2 in lines[0]: cv2.line(img_before, (x1, y1), (x2, y2), (255, 0, 0), 3) angle = math.degrees(math.atan2(y2 - y1, x2 - x1)) angles.append(angle) median_angle = np.median(angles) #print(median_angle) print("角度为 {}".format(median_angle)) - Ravi

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- nathancy · Accepted Answer

这里提供了一种假设大部分文本都倾斜在一侧的方法。思路是我们可以根据主要文本区域的位置来确定角度。

加载图片，转换为灰度图，并进行高斯模糊
使用自适应阈值获取二值图像
查找轮廓，并使用轮廓面积进行筛选
将筛选后的轮廓绘制到掩码上
根据方向水平或垂直地拆分图像
计算每一半中的像素数

将图像转换为灰度并进行高斯模糊后，我们使用自适应阈值获取二值图像

从这里我们可以找到轮廓并使用轮廓面积进行过滤，以去除小噪声颗粒和大边框。我们将通过此过滤器的任何轮廓绘制到掩码上。

为了确定角度，我们根据图像的尺寸将其分成两半。如果宽度>高度，则必须是水平图像，因此我们在垂直方向上将其分成两半。如果高度>宽度，则必须是竖直图像，因此我们在水平方向上将其分成两半。

现在我们有了两半部分，可以使用cv2.countNonZero()来确定每一半的白色像素数量。以下是确定角度的逻辑：

if horizontal
    if left >= right 
        degree -> 0
    else 
        degree -> 180
if vertical
    if top >= bottom
        degree -> 270
    else
        degree -> 90

因此，图像为0度。以下是其他方向的结果：

左侧 9703

右侧 3975

左 3975

右 9703

我们可以得出结论，该图像翻转了180度。

以下是竖直图像的结果。请注意，由于这是一张竖直图像，我们将其水平分割。

顶部 3947

底部 9550

因此结果为90度

import cv2
import numpy as np

def detect_angle(image):
    mask = np.zeros(image.shape, dtype=np.uint8)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (3,3), 0)
    adaptive = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,15,4)

    cnts = cv2.findContours(adaptive, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]

    for c in cnts:
        area = cv2.contourArea(c)
        if area < 45000 and area > 20:
            cv2.drawContours(mask, [c], -1, (255,255,255), -1)

    mask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
    h, w = mask.shape
    
    # Horizontal
    if w > h:
        left = mask[0:h, 0:0+w//2]
        right = mask[0:h, w//2:]
        left_pixels = cv2.countNonZero(left)
        right_pixels = cv2.countNonZero(right)
        return 0 if left_pixels >= right_pixels else 180
    # Vertical
    else:
        top = mask[0:h//2, 0:w]
        bottom = mask[h//2:, 0:w]
        top_pixels = cv2.countNonZero(top)
        bottom_pixels = cv2.countNonZero(bottom)
        return 90 if bottom_pixels >= top_pixels else 270

if __name__ == '__main__':
    image = cv2.imread('1.png')
    angle = detect_angle(image)
    print(angle)