苹果视觉框架 - 从图像中提取文本

Question

苹果视觉框架 - 从图像中提取文本

swiftmachine-learningswiftuicoremlapple-vision

24

我正在使用iOS 11的Vision框架来检测图像上的文本。

文本已经成功地被检测到了，但我们该如何获得检测到的文本呢？

- iOS

2

@Alex 已经获取到被检测到的区域。需要解决方案来读取该检测到的区域。 - iOS

3

可能是将Vision VNTextObservation转换为字符串的重复问题。 - Artem Novichkov

@PoojaM.Bohora 我们可以使用以下组合来提取文本。该解决方案适用于iOS 11+ - Vision Framework + ML（模型）+ Tesseract OCR开源。执行步骤：1）Vision Framework + ML将检测文本区域。2）将文本区域转换为CGRect，并获取该区域中的文本图像。3）将文本图像的条带传递给Tesseract OCR，而不是传递完整的图像以获取文本。 - iOS

我正在使用 pod 'TesseractOCRiOS', '4.0.0'，但仍然没有得到精确的结果。你有什么建议吗？ - Pooja M. Bohora

2

根据我的经验，我提取的结果准确率为70-80％。有许多因素会影响文本提取，例如“文本字体大小（如果字体大小较小则无法正常工作）”、“Tesseract配置设置（根据需求配置Tesseract引擎）”。在提取文本时，请使用Tesseract的“黑白”模式。此外，请考虑图像大小，越大越好。 - iOS

显示剩余5条评论

3个回答

0

虽然不完全相同，但与将Vision VNTextObservation转换为字符串类似

您需要使用CoreML或其他库来执行OCR（SwiftOCR等）

- nathan

-7

这将返回一个带有检测到的文本上的矩形框的叠加图像。

这是完整的Xcode项目 https://github.com/cyruslok/iOS11-Vision-Framework-Demo

希望对您有所帮助。

// Text Detect
func textDetect(dectect_image:UIImage, display_image_view:UIImageView)->UIImage{
    let handler:VNImageRequestHandler = VNImageRequestHandler.init(cgImage: (dectect_image.cgImage)!)
    var result_img:UIImage = UIImage.init();

    let request:VNDetectTextRectanglesRequest = VNDetectTextRectanglesRequest.init(completionHandler: { (request, error) in
        if( (error) != nil){
            print("Got Error In Run Text Dectect Request");

        }else{
            result_img = self.drawRectangleForTextDectect(image: dectect_image,results: request.results as! Array<VNTextObservation>)
        }
    })
    request.reportCharacterBoxes = true
    do {
        try handler.perform([request])
        return result_img;
    } catch {
        return result_img;
    }
}

func drawRectangleForTextDectect(image: UIImage, results:Array<VNTextObservation>) -> UIImage {
    let renderer = UIGraphicsImageRenderer(size: image.size)
    var t:CGAffineTransform = CGAffineTransform.identity;
    t = t.scaledBy( x: image.size.width, y: -image.size.height);
    t = t.translatedBy(x: 0, y: -1 );

    let img = renderer.image { ctx in
        for item in results {
            let TextObservation:VNTextObservation = item
            ctx.cgContext.setFillColor(UIColor.clear.cgColor)
            ctx.cgContext.setStrokeColor(UIColor.green.cgColor)
            ctx.cgContext.setLineWidth(1)
            ctx.cgContext.addRect(item.boundingBox.applying(t))
            ctx.cgContext.drawPath(using: .fillStroke)

            for item_2 in TextObservation.characterBoxes!{
                let RectangleObservation:VNRectangleObservation = item_2
                ctx.cgContext.setFillColor(UIColor.clear.cgColor)
                ctx.cgContext.setStrokeColor(UIColor.red.cgColor)
                ctx.cgContext.setLineWidth(1)
                ctx.cgContext.addRect(RectangleObservation.boundingBox.applying(t))
                ctx.cgContext.drawPath(using: .fillStroke)
            }
        }

    }
    return img
}

- C4L

与问题无关，请在代码中添加更多的解释。 - Unterbelichtet

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Andy Jazz · Accepted Answer

识别图像中的文本

VNRecognizeTextRequest 从 iOS 13.0 和 macOS 10.15 及更高版本开始使用。

在 Apple Vision 中，您可以使用 VNRecognizeTextRequest 类轻松从图像中提取文本，允许您发出图像分析请求来查找和识别图像中的文本。

以下是一个 SwiftUI 解决方案，向您展示如何执行此操作 (已在 Xcode 13.4、iOS 15.5 上进行测试):

import SwiftUI
import Vision

struct ContentView: View {
        
    var body: some View {
        ZStack {
            Color.black.ignoresSafeArea()
            Image("imageText").scaleEffect(0.5)
            SomeText()
        }
    }
}

逻辑如下：

struct SomeText: UIViewRepresentable {
    let label = UITextView(frame: .zero)
    
    func makeUIView(context: Context) -> UITextView {
        label.backgroundColor = .clear
        label.textColor = .systemYellow
        label.textAlignment = .center
        label.font = .boldSystemFont(ofSize: 25)
        return label
    }
    func updateUIView(_ uiView: UITextView, context: Context) {
        let path = Bundle.main.path(forResource: "imageText", ofType: "png")
        let url = URL(fileURLWithPath: path!)
        let requestHandler = VNImageRequestHandler(url: url, options: [:])

        let request = VNRecognizeTextRequest { (request, _) in
            guard let obs = request.results as? [VNRecognizedTextObservation]
            else { return }

            for observation in obs {
                let topCan: [VNRecognizedText] = observation.topCandidates(1)

                if let recognizedText: VNRecognizedText = topCan.first {
                    label.text = recognizedText.string
                }
            }
        }   // non-realtime asynchronous but accurate text recognition
        request.recognitionLevel = VNRequestTextRecognitionLevel.accurate
            // nearly realtime but not-accurate text recognition
        request.recognitionLevel = VNRequestTextRecognitionLevel.fast
        try? requestHandler.perform([request])
    }
}

如果您想知道支持识别的语言列表，请阅读this post。