使用OpenCV图像处理来去除图像中的背景文字和噪点

11

我有这些图片

输入图像描述

输入图像描述

我想删除背景中的文本,只保留验证码字符(K6PwKA、YabVzu)。任务是稍后使用tesseract识别这些字符。

这是我的尝试,但准确性并不太好。

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Users\HPO2KOR\AppData\Local\Tesseract-OCR\tesseract.exe"
img = cv2.imread("untitled.png")
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray_filtered = cv2.inRange(gray_image, 0, 75)
cv2.imwrite("cleaned.png", gray_filtered)

如何改进相同的内容?

注意: 我尝试了所有为这个问题提供的建议,但没有一个对我有效。

编辑: 根据Elias的说法,我尝试使用Photoshop将验证码文本转换为灰度,并找到了其颜色位于[100、105]之间。然后我根据此范围对图像进行二值化处理。但是我得到的结果并不能令人满意。

gray_filtered = cv2.inRange(gray_image, 100, 105)
cv2.imwrite("cleaned.png", gray_filtered)
gray_inv = ~gray_filtered
cv2.imwrite("cleaned.png", gray_inv)
data = pytesseract.image_to_string(gray_inv, lang='eng')

输出:

'KEP wKA'

结果:

在此输入图像描述

编辑2:

def get_text(img_name):
    lower = (100, 100, 100)
    upper = (104, 104, 104) 
    img = cv2.imread(img_name)
    img_rgb_inrange = cv2.inRange(img, lower, upper)
    neg_rgb_image = ~img_rgb_inrange
    cv2.imwrite('neg_img_rgb_inrange.png', neg_rgb_image)
    data = pytesseract.image_to_string(neg_rgb_image, lang='eng')
    return data

提供:

输入图片描述

以及文字:

GXuMuUZ

有没有办法稍微缓和它一点?

3个回答

19

以下有两种可能的方法和一种校正扭曲文本的方法:

方法 #1: 形态学操作 + 轮廓过滤

  1. 获取二值图像。 加载图像灰度化,然后使用大津阈值法二值化

  2. 删除文本轮廓。cv2.getStructuringElement()创建一个矩形核,然后进行形态学操作以去除噪声。

  3. 过滤并去除小的噪点。 查找轮廓并使用轮廓面积进行过滤以去除小粒子。我们通过用cv2.drawContours()填充轮廓来有效地去除噪声。

  4. 进行OCR。 我们反转图像,然后应用轻微的高斯模糊。然后我们使用Pytesseract进行OCR,使用--psm 6配置选项将图像视为单个文本块。可以查看Tesseract improve quality了解其他提高检测率的方法,以及Pytesseract配置选项了解其他设置。


输入图像 -> 二值化 -> 形态学开运算

enter image description here enter image description here enter image description here

轮廓面积过滤 -> 反转 -> 应用模糊以获得结果

这里输入图片描述 这里输入图片描述 这里输入图片描述

OCR结果

YabVzu

代码

import cv2
import pytesseract
import numpy as np

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, grayscale, Otsu's threshold
image = cv2.imread('2.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)

# Find contours and remove small noise
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 50:
        cv2.drawContours(opening, [c], -1, 0, -1)

# Invert and apply slight Gaussian blur
result = 255 - opening
result = cv2.GaussianBlur(result, (3,3), 0)

# Perform OCR
data = pytesseract.image_to_string(result, lang='eng', config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('result', result)
cv2.waitKey()     

方法二:颜色分割

观察到所需提取的文本与图像中的噪声有明显的对比,我们可以使用颜色阈值分割来隔离文本。这个想法是将图像转换为HSV格式,然后使用上下限颜色范围进行颜色阈值分割以获取掩码。然后我们使用相同的过程使用Pytesseract进行OCR。


输入图像 -> 掩码 -> 结果

输入图像 掩码 结果

代码

import cv2
import pytesseract
import numpy as np

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, convert to HSV, color threshold to get mask
image = cv2.imread('2.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 0])
upper = np.array([100, 175, 110])
mask = cv2.inRange(hsv, lower, upper)

# Invert image and OCR
invert = 255 - mask
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)

cv2.imshow('mask', mask)
cv2.imshow('invert', invert)
cv2.waitKey()

修正扭曲的文字

OCR在图像水平时效果最佳。为了确保文本以OCR理想格式出现,我们可以进行透视变换。在去除所有噪声以隔离文本后,我们可以执行形态闭合来将单个文本轮廓合并成一个轮廓。从这里开始,我们可以使用cv2.minAreaRect找到旋转边界框,然后使用四点透视变换imutils.perspective.four_point_transform。继续使用清理后的掩码,下面是结果:

掩码 -> 形态闭合 -> 检测到的旋转边界框 -> 结果

enter image description here enter image description here enter image description here enter image description here

使用另一张图片的输出

enter image description here enter image description here enter image description here enter image description here

更新的代码包含透视变换

import cv2
import pytesseract
import numpy as np
from imutils.perspective import four_point_transform

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, convert to HSV, color threshold to get mask
image = cv2.imread('1.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 0])
upper = np.array([100, 175, 110])
mask = cv2.inRange(hsv, lower, upper)

# Morph close to connect individual text into a single contour
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=3)

# Find rotated bounding box then perspective transform
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
rect = cv2.minAreaRect(cnts[0])
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(image,[box],0,(36,255,12),2)
warped = four_point_transform(255 - mask, box.reshape(4, 2))

# OCR
data = pytesseract.image_to_string(warped, lang='eng', config='--psm 6')
print(data)

cv2.imshow('mask', mask)
cv2.imshow('close', close)
cv2.imshow('warped', warped)
cv2.imshow('image', image)
cv2.waitKey()

Note: 该颜色阈值范围是使用此HSV阈值脚本确定的

import cv2
import numpy as np

def nothing(x):
    pass

# Load image
image = cv2.imread('2.png')

# Create a window
cv2.namedWindow('image')

# Create trackbars for color change
# Hue is from 0-179 for Opencv
cv2.createTrackbar('HMin', 'image', 0, 179, nothing)
cv2.createTrackbar('SMin', 'image', 0, 255, nothing)
cv2.createTrackbar('VMin', 'image', 0, 255, nothing)
cv2.createTrackbar('HMax', 'image', 0, 179, nothing)
cv2.createTrackbar('SMax', 'image', 0, 255, nothing)
cv2.createTrackbar('VMax', 'image', 0, 255, nothing)

# Set default value for Max HSV trackbars
cv2.setTrackbarPos('HMax', 'image', 179)
cv2.setTrackbarPos('SMax', 'image', 255)
cv2.setTrackbarPos('VMax', 'image', 255)

# Initialize HSV min/max values
hMin = sMin = vMin = hMax = sMax = vMax = 0
phMin = psMin = pvMin = phMax = psMax = pvMax = 0

while(1):
    # Get current positions of all trackbars
    hMin = cv2.getTrackbarPos('HMin', 'image')
    sMin = cv2.getTrackbarPos('SMin', 'image')
    vMin = cv2.getTrackbarPos('VMin', 'image')
    hMax = cv2.getTrackbarPos('HMax', 'image')
    sMax = cv2.getTrackbarPos('SMax', 'image')
    vMax = cv2.getTrackbarPos('VMax', 'image')

    # Set minimum and maximum HSV values to display
    lower = np.array([hMin, sMin, vMin])
    upper = np.array([hMax, sMax, vMax])

    # Convert to HSV format and color threshold
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv, lower, upper)
    result = cv2.bitwise_and(image, image, mask=mask)

    # Print if there is a change in HSV value
    if((phMin != hMin) | (psMin != sMin) | (pvMin != vMin) | (phMax != hMax) | (psMax != sMax) | (pvMax != vMax) ):
        print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax , vMax))
        phMin = hMin
        psMin = sMin
        pvMin = vMin
        phMax = hMax
        psMax = sMax
        pvMax = vMax

    # Display result image
    cv2.imshow('image', result)
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()

请帮我解决这个问题 - Scrappy Coco

6

你的代码比这个更好。我基于直方图CDF值和一个阈值为upperblowerb设置了一个门限,在这里按下ESC按钮可以获取下一张图片。

这段代码过于复杂,需要各种优化。可以重排代码以跳过一些步骤。我把它留下来是因为它的某些部分可能对其他人有帮助。通过保留面积高于某个阈值的轮廓,可以消除一些现有的噪声。欢迎提出其他降噪方法的建议。

在这里找到一个获取透视变换的4个角点的类似简单代码,

准确的角点检测?

代码说明:

  • 原始图片
  • 中值滤波器(去噪和ROI识别)
  • OTSU二值化
  • 反转图像
  • 使用反转后的黑白图像作为掩膜,保留原始图像的大部分ROI
  • 扩张以找到最大的轮廓
  • 用矩形和角点在原始图像中标记ROI

  • 将ROI调整为直线并提取其中的内容

  • 中值滤波器
  • OTSU二值化
  • 反转图像以获取掩膜
  • 将直线图像掩膜,以进一步消除文本的大部分噪声
  • 使用从直方图cdf得到的lowerb和upperb值进行范围内处理,以进一步减少噪声
  • 也许在这一步侵蚀图像会产生相当可接受的结果。然而这里再次扩张图片,用作掩膜来获取透视变换后的图像中更少噪声的ROI。

代码:

## Press ESC button to get next image

import cv2
import cv2 as cv
import numpy as np


frame = cv2.imread('extra/c1.png')
#frame = cv2.imread('extra/c2.png')


## keeping a copy of original
print(frame.shape)
original_frame = frame.copy()
original_frame2 = frame.copy()


## Show the original image
winName = 'Original'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)



## Apply median blur
frame = cv2.medianBlur(frame,9)


## Show the original image
winName = 'Median Blur'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


#kernel = np.ones((5,5),np.uint8)
#frame = cv2.dilate(frame,kernel,iterations = 1)



# Otsu's thresholding
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
ret2,thresh_n = cv.threshold(frame,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
frame = thresh_n


## Show the original image
winName = 'Otsu Thresholding'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)




## invert color
frame = cv2.bitwise_not(frame)

## Show the original image
winName = 'Invert Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


## Dilate image
kernel = np.ones((5,5),np.uint8)
frame = cv2.dilate(frame,kernel,iterations = 1)


##
## Show the original image
winName = 'SUB'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
img_gray = cv2.cvtColor(original_frame, cv2.COLOR_BGR2GRAY)
cv.imshow(winName, img_gray & frame)
cv.waitKey(0)


## Show the original image
winName = 'Dilate Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


## Get largest contour from contours
contours, hierarchy = cv2.findContours(frame, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)


## Get minimum area rectangle and corner points
rect = cv2.minAreaRect(max(contours, key = cv2.contourArea))
print(rect)
box = cv2.boxPoints(rect)
print(box)


## Sorted points by x and y
## Not used in this code
print(sorted(box , key=lambda k: [k[0], k[1]]))



## draw anchor points on corner
frame = original_frame.copy()
z = 6
for b in box:
    cv2.circle(frame, tuple(b), z, 255, -1)


## show original image with corners
box2 = np.int0(box)
cv2.drawContours(frame,[box2],0,(0,0,255), 2)
cv2.imshow('Detected Corners',frame)
cv2.waitKey(0)
cv2.destroyAllWindows()



## https://dev59.com/kmgu5IYBdhLWcg3wDSwG
def subimage(image, center, theta, width, height):
   shape = ( image.shape[1], image.shape[0] ) # cv2.warpAffine expects shape in (length, height)

   matrix = cv2.getRotationMatrix2D( center=center, angle=theta, scale=1 )
   image = cv2.warpAffine( src=image, M=matrix, dsize=shape )

   x = int(center[0] - width / 2)
   y = int(center[1] - height / 2)

   image = image[ y:y+height, x:x+width ]

   return image



## Show the original image
winName = 'Dilate Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)


## use the calculated rectangle attributes to rotate and extract it
frame = subimage(original_frame, center=rect[0], theta=int(rect[2]), width=int(rect[1][0]), height=int(rect[1][1]))
original_frame = frame.copy()
cv.imshow(winName, frame)
cv.waitKey(0)

perspective_transformed_image = frame.copy()



## Apply median blur
frame = cv2.medianBlur(frame,11)


## Show the original image
winName = 'Median Blur'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


#kernel = np.ones((5,5),np.uint8)
#frame = cv2.dilate(frame,kernel,iterations = 1)



# Otsu's thresholding
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
ret2,thresh_n = cv.threshold(frame,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
frame = thresh_n


## Show the original image
winName = 'Otsu Thresholding'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)



## invert color
frame = cv2.bitwise_not(frame)

## Show the original image
winName = 'Invert Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


## Dilate image
kernel = np.ones((5,5),np.uint8)
frame = cv2.dilate(frame,kernel,iterations = 1)

##
## Show the original image
winName = 'SUB'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
img_gray = cv2.cvtColor(original_frame, cv2.COLOR_BGR2GRAY)
frame = img_gray & frame
frame[np.where(frame==0)] = 255
cv.imshow(winName, frame)
cv.waitKey(0)





hist,bins = np.histogram(frame.flatten(),256,[0,256])

cdf = hist.cumsum()
cdf_normalized = cdf * hist.max()/ cdf.max()
print(cdf)
print(cdf_normalized)
hist_image = frame.copy()




## two decresing range algorithm
low_index = -1
for i in range(0, 256):
   if cdf[i] > 0:
      low_index = i
      break
print(low_index)

tol = 0
tol_limit = 20
broken_index = -1
past_val = cdf[low_index] - cdf[low_index + 1]
for i in range(low_index + 1, 255):
   cur_val = cdf[i] - cdf[i+1]
   if tol > tol_limit:
      broken_index = i
      break
   if cur_val < past_val:
      tol += 1
   past_val = cur_val

print(broken_index)




##
lower = min(frame.flatten())
upper = max(frame.flatten())
print(min(frame.flatten()))
print(max(frame.flatten()))

#img_rgb_inrange = cv2.inRange(frame_HSV, np.array([lower,lower,lower]), np.array([upper,upper,upper]))
img_rgb_inrange = cv2.inRange(frame, (low_index), (broken_index))
neg_rgb_image = ~img_rgb_inrange
## Show the original image
winName = 'Final'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, neg_rgb_image)
cv.waitKey(0)


kernel = np.ones((3,3),np.uint8)
frame = cv2.erode(neg_rgb_image,kernel,iterations = 1)
winName = 'Final Dilate'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


##
winName = 'Final Subtracted'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
img2 = np.zeros_like(perspective_transformed_image)
img2[:,:,0] = frame
img2[:,:,1] = frame
img2[:,:,2] = frame
frame = img2
cv.imshow(winName, perspective_transformed_image | frame)
cv.waitKey(0)


##
import matplotlib.pyplot as plt
plt.plot(cdf_normalized, color = 'b')
plt.hist(hist_image.flatten(),256,[0,256], color = 'r')
plt.xlim([0,256])
plt.legend(('cdf','histogram'), loc = 'upper left')
plt.show()

1. 中值滤波:

enter image description here

2. OTSU阈值:

enter image description here

3. 反转:

enter image description here

4. 反转图像膨胀:

enter image description here

5. 掩码提取:

enter image description here

6. 转换的ROI点:

enter image description here

7. 透视校正后的图像:

enter image description here

8. 中值模糊:

enter image description here

9. OTSU阈值:

enter image description here

10. 反转图像:

enter image description here

11. ROI提取:

enter image description here

12. 夹紧:

这里输入图片描述

13. 膨胀:

这里输入图片描述

14. 最终兴趣区域(ROI):

这里输入图片描述

15. 步骤11图像的直方图:

这里输入图片描述


2
没有尝试过,但这可能有效。
步骤1: 使用PS查找验证码字符的颜色。例如,“YabVzu”的颜色为(128,128,128)。
步骤2: 使用pillow的方法getdata()/getcolor(),它将返回一个包含每个像素颜色的序列。
然后,我们将序列中的每个项目投影到原始验证码图像中。
因此,我们知道图像中每个像素的位置。
步骤3: 找到所有颜色与(128,128,128)最接近的像素。您可以设置阈值来控制准确性。此步骤返回另一个序列。让我们将其注释为Seq a。
步骤4: 生成一张与原始图片具有完全相同高度和宽度的图片。 在图片中的非常准确的位置绘制[Seq a]中的每个像素。在这里,我们将获得干净的训练项目。
步骤5: 使用Keras project来破解代码。精度应该超过72%。

项目建议的原因是,理论上经过处理后生成的图像应该具有与项目所处理的验证码图像类似的模式。 - Elias

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接