Python实现实时OCR

Question

Python实现实时OCR

pythonimageocrimage-recognitionpython-tesseract

10

问题

我正在尝试使用OpenCV捕获我的桌面，并使用Tesseract OCR查找文本并将其设置为变量。例如，如果我要玩游戏并在捕获帧中显示资源数量，我希望能够打印出来并使用它。一个完美的例子是Micheal Reeves的视频，每当他在游戏中失去生命值时，它会显示并发送到他的蓝牙启用的气动枪上来射击他。到目前为止，我有以下代码：

# imports
from PIL import ImageGrab
from PIL import Image
import numpy as np
import pytesseract
import argparse
import cv2
import os

fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter("output.avi", fourcc, 5.0, (1366, 768))

while(True):
        x = 760
        y = 968

        ox = 50
        oy = 22

        # screen capture
        img = ImageGrab.grab(bbox=(x, y, x + ox, y + oy))
        img_np = np.array(img)
        frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2RGB)
        cv2.imshow("Screen", frame)
        out.write(frame)

        if cv2.waitKey(1) == 0:
                break

out.release()
cv2.destroyAllWindows()

它可以实时捕获并在一个窗口中显示，但我不知道如何让它在每个帧中识别文本并输出。

有什么帮助吗？

- Novet

也许您可以展示一些它的外观示例？ - Mark Setchell

可能这不是手写字，而且字体非常重复，所以没有必要使用真正的OCR。从例子中可以看出。 - Vincenzooo

3个回答

1

Tesseract是一个单次使用的命令行应用程序，使用文件作为输入和输出，这意味着每个OCR调用都会创建一个新进程并初始化一个新的Tesseract引擎，其中包括从磁盘读取多兆字节的数据文件。它作为实时OCR引擎的适用性将取决于确切的用例——更多像素需要更多时间——以及提供给OCR引擎进行调优的参数。最终可能需要进行一些实验来调整引擎以适应具体情况，但也要预计OCR所需的时间可能超过帧时间，可能需要减少OCR执行的频率，即以10-20 FPS而不是游戏运行的60+ FPS执行OCR。

根据我的经验，在2200x1700px图像中的一个相当复杂的文档上，使用英语快速模型和4个核心（默认值）在老化的CPU上可能需要0.5秒到2秒不等，但是这个“复杂的文档”代表了最坏的情况，并没有假设要识别的文本结构。对于许多场景，例如从游戏屏幕中提取数据，可以做出一些假设来实现一些优化并加快OCR速度：

缩小输入图像的大小。从屏幕中提取特定信息时，尽量裁剪截取到的屏幕图像，只保留该信息。如果您正在尝试提取像健康值这样的数值，请将图像裁剪到仅健康值周围。
使用“快速”训练模型来提高速度，但会牺牲准确性。您可以使用-l选项指定不同的模型，使用--testdata-dir选项指定包含模型文件的目录。您可以下载多个模型并将文件重命名为“eng_fast.traineddata”、“eng_best.traineddata”等。
使用--psm参数防止对不需要的页面进行分段。对于单个信息片段，--psm 7可能是最佳选择，但请试验不同的值并找到最适合您的。
如果您知道将要使用哪些字符集，例如仅寻找数字，则可以通过更改白名单配置值来限制允许的字符集：-c tessedit_char_whitelist='1234567890'。

pytesseract 是使用 Tesseract 的最佳方式，该库可以直接处理图像输入（尽管在传递给 Tesseract 之前会将图像保存到文件中），并使用 image_to_string(...) 返回结果文本。

import pytesseract

# Capture frame...

# If the frame requires cropping:
frame = frame[y:y + h, x:x + w] 

# Perform OCR
text = pytesseract.image_to_string(frame, lang="eng_fast" config="--psm 7")

# Process the result
health = int(text)

- Enigma

虽然这是非常好的解释，有助于理解，但我不认为它直接回答了问题。OP有一个捕获实时图像的代码。例如，假设我们想在实时视频中捕获静态、小空间的字幕，应该如何实现呢？ - Havard Kleven

代码块的结尾包含了使用image_to_string函数从帧中提取文本的过程。 - Enigma

1

好的，我遇到了和你一样的问题，所以我进行了一些研究，我确信我找到了解决方案！首先，您需要这些库：

cv2
pytesseract
Pillow(PIL)
numpy

安装:

要安装cv2，只需在命令行/命令提示符中使用以下命令：pip install opencv-python
安装pytesseract有点困难，因为您还需要预先安装Tesseract，它是实际执行ocr读取的程序。首先，请按照此教程安装Tesseract。之后，在命令行/命令提示符中使用以下命令：pip install pytesseract。如果您没有正确安装它，将无法使用ocr。
要安装Pillow，请在命令行/命令提示符中使用以下命令：python -m pip install --upgrade Pillow或python3 -m pip install --upgrade Pillow。对我来说，使用python的那个命令有效。
要安装NumPy，请在命令行/命令提示符中使用以下命令：pip install numpy。尽管大多数Python库中已经安装了它。

代码： 这段代码是由我编写的，目前它按照我的要求工作，并且类似于Michal的效果。它将获取您屏幕左上角的图像，并显示当前正在使用OCR读取的图像窗口。然后在控制台中，它会打印出屏幕上读取的文本。

# OCR Screen Scanner
# By Dornu Inene
# Libraries that you show have all installed
import cv2
import numpy as np
import pytesseract

# We only need the ImageGrab class from PIL
from PIL import ImageGrab

# Run forever unless you press Esc
while True:
    # This instance will generate an image from
    # the point of (115, 143) and (569, 283) in format of (x, y)
    cap = ImageGrab.grab(bbox=(115, 143, 569, 283))

    # For us to use cv2.imshow we need to convert the image into a numpy array
    cap_arr = np.array(cap)

    # This isn't really needed for getting the text from a window but
    # It will show the image that it is reading it from

    # cv2.imshow() shows a window display and it is using the image that we got
    # use array as input to image
    cv2.imshow("", cap_arr)

    # Read the image that was grabbed from ImageGrab.grab using    pytesseract.image_to_string
    # This is the main thing that will collect the text information from that specific area of the window
    text = pytesseract.image_to_string(cap)

    # This just removes spaces from the beginning and ends of text
    # and makes the the it reads more clean
    text = text.strip()

    # If any text was translated from the image, print it
    if len(text) > 0:
        print(text)

    # This line will break the while loop when you press Esc
    if cv2.waitKey(1) == 27:
        break

# This will make sure all windows created from cv2 is destroyed
cv2.destroyAllWindows()

我希望这篇文章能帮助到你，它确实帮了我一个大忙！

- DonDoesProgramming

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Kingsley · Accepted Answer

抓取屏幕并将其传递给tesseract进行OCR处理非常简单。

PIL（pillow）库可以在MacOS和Windows上轻松抓取帧。然而，Linux的这个功能是最近才添加的，因此下面的代码解决了它不存在的问题。（我在Ubuntu 19.10上，我的Pillow不支持它）。

用户使用屏幕区域矩形坐标启动程序。主循环不断抓取屏幕的此区域，并将其提供给Tesseract。如果Tesseract在图像中发现任何非空白文本，则会将其写入stdout。

请注意，这不是一个真正的实时系统。没有时间保证，每一帧的时间都不同。您的计算机可能达到60 FPS或6 FPS。这也会受到您要监视的矩形大小的极大影响。

#! /usr/bin/env python3

import sys
import pytesseract
from PIL import Image

# Import ImageGrab if possible, might fail on Linux
try:
    from PIL import ImageGrab
    use_grab = True
except Exception as ex:
    # Some older versions of pillow don't support ImageGrab on Linux
    # In which case we will use XLib 
    if ( sys.platform == 'linux' ):
        from Xlib import display, X   
        use_grab = False
    else:
        raise ex


def screenGrab( rect ):
    """ Given a rectangle, return a PIL Image of that part of the screen.
        Handles a Linux installation with and older Pillow by falling-back
        to using XLib """
    global use_grab
    x, y, width, height = rect

    if ( use_grab ):
        image = PIL.ImageGrab.grab( bbox=[ x, y, x+width, y+height ] )
    else:
        # ImageGrab can be missing under Linux
        dsp  = display.Display()
        root = dsp.screen().root
        raw_image = root.get_image( x, y, width, height, X.ZPixmap, 0xffffffff )
        image = Image.frombuffer( "RGB", ( width, height ), raw_image.data, "raw", "BGRX", 0, 1 )
        # DEBUG image.save( '/tmp/screen_grab.png', 'PNG' )
    return image


### Do some rudimentary command line argument handling
### So the user can speicify the area of the screen to watch
if ( __name__ == "__main__" ):
    EXE = sys.argv[0]
    del( sys.argv[0] )

    # EDIT: catch zero-args
    if ( len( sys.argv ) != 4 or sys.argv[0] in ( '--help', '-h', '-?', '/?' ) ):  # some minor help
        sys.stderr.write( EXE + ": monitors section of screen for text\n" )
        sys.stderr.write( EXE + ": Give x, y, width, height as arguments\n" )
        sys.exit( 1 )

    # TODO - add error checking
    x      = int( sys.argv[0] )
    y      = int( sys.argv[1] )
    width  = int( sys.argv[2] )
    height = int( sys.argv[3] )

    # Area of screen to monitor
    screen_rect = [ x, y, width, height ]  
    print( EXE + ": watching " + str( screen_rect ) )

    ### Loop forever, monitoring the user-specified rectangle of the screen
    while ( True ): 
        image = screenGrab( screen_rect )              # Grab the area of the screen
        text  = pytesseract.image_to_string( image )   # OCR the image

        # IF the OCR found anything, write it to stdout.
        text = text.strip()
        if ( len( text ) > 0 ):
            print( text )

这个答案是从Stack Overflow上其他答案中拼凑而来的。

如果你经常使用这个答案，最好添加一个速率限制器来节省一些CPU。它可能可以在每个循环中休眠半秒钟。