使用ML Kit(带CMSampleBuffer)从实时视频流中识别文本

6
我想修改由Google提供的设备上文本识别示例,链接在这里,使其能够与实时摄像头流一起工作。
当将相机对准文本(适用于图像示例)时,我的控制台会在运行过程中产生以下流,最终会耗尽内存: 2018-05-16 10:48:22.129901+1200 TextRecognition[32138:5593533] An empty result returned from from GMVDetector for VisionTextDetector. 这是我的视频捕获方法:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

        if let textDetector = self.textDetector {

            let visionImage = VisionImage(buffer: sampleBuffer)
            let metadata = VisionImageMetadata()
            metadata.orientation = .rightTop
            visionImage.metadata = metadata

            textDetector.detect(in: visionImage) { (features, error) in
                guard error == nil, let features = features, !features.isEmpty else {
                    // Error. You should also check the console for error messages.
                    // ...
                    return
                }

                // Recognized and extracted text
                print("Detected text has: \(features.count) blocks")
                // ...
            }

        }

    }

这样做是否正确?


1
这里一定有什么问题,我遇到了与此问题 https://dev59.com/-6vka4cB1Zd3GeqPpzvJ 相同的问题,你的问题看起来也相关,如果 Firebase 的人之中有人能看到这个就太好了 :) - Jason
1
@dave,目前SDK只能接受直立的图像。您的图像是否旋转了?在开发者文档中已经说明了(请搜索“使用UIImage或CMSampleBufferRef创建VisionImage对象。”在https://firebase-dot-devsite.googleplex.com/docs/ml-kit/ios/recognize-text#1-run-the-text-detector)。 - Isabella Chen
嗨@IsabellaChen,相机处于竖屏模式,但无论方向如何,空结果消息都会出现。 - dave
@IsabellaChen,是否有可用的工作示例,使用实时视频流进行文本检测?我发现可以使用条形码检测器在实时视频流中检测条形码,但如果我使用相同的方法进行文本识别,则会出现上述错误。 - dave
2
@Josh Robbins(和Dave),我在下面发布了一些使用CMSampleBuffer的Objective C代码片段,应该可以工作。你能试试吗?如果对你们两个仍然不起作用,你们能分享一下1)你们的设备类型2)你们是否设置了kCVPixelBufferPixelFormatTypeKey的任何值?3)你们设备可用的第一个视频CVPixelFormatTypes格式是什么。谢谢。 - Isabella Chen
显示剩余4条评论
2个回答

7

ML Kit已经从Firebase中迁移出来,成为一个独立的SDK(迁移指南)。

这里提供了一个使用ML Kit(使用CMSampleBuffer)从实时视频流进行文本识别的Swift快速入门示例应用程序:

https://github.com/googlesamples/mlkit/tree/master/ios/quickstarts/textrecognition/TextRecognitionExample

实时视频流的实现在CameraViewController.swift文件中:

https://github.com/googlesamples/mlkit/blob/master/ios/quickstarts/textrecognition/TextRecognitionExample/CameraViewController.swift


我已经更新了这个三年多前的答案,以反映ML Kit的最新状态。 - Dong Chen
我认为这个例子已经被移动到一个文件中,结合了面部识别、条形码识别等功能:https://github.com/googlesamples/mlkit/blob/master/ios/quickstarts/vision/VisionExample/CameraViewController.swift - btraas

2

ML Kit目前正在将CMSampleBuffer的示例代码添加到Firebase Quick Start中。

与此同时,下面的代码适用于CMSampleBuffer。

设置AV捕获(使用kCVPixelFormatType_32BGRA作为kCVPixelBufferPixelFormatTypeKey的值):

@property(nonatomic, strong) AVCaptureSession *session;
@property(nonatomic, strong) AVCaptureVideoDataOutput *videoDataOutput;

- (void)setupVideoProcessing {
  self.videoDataOutput = [[AVCaptureVideoDataOutput alloc] init];
  NSDictionary *rgbOutputSettings = @{
      (__bridge NSString*)kCVPixelBufferPixelFormatTypeKey :  @(kCVPixelFormatType_32BGRA)
  };
  [self.videoDataOutput setVideoSettings:rgbOutputSettings];

  if (![self.session canAddOutput:self.videoDataOutput]) {
    [self cleanupVideoProcessing];
    NSLog(@"Failed to setup video output");
    return;
  }
  [self.videoDataOutput setAlwaysDiscardsLateVideoFrames:YES];
  [self.videoDataOutput setSampleBufferDelegate:self queue:self.videoDataOutputQueue];
  [self.session addOutput:self.videoDataOutput];
}

消费CMSampleBuffer并运行检测:

- (void)runDetection:(AVCaptureOutput *)captureOutput
    didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
           fromConnection:(AVCaptureConnection *)connection {

  CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
  size_t imageWidth = CVPixelBufferGetWidth(imageBuffer);
  size_t imageHeight = CVPixelBufferGetHeight(imageBuffer);

  AVCaptureDevicePosition devicePosition = self.isUsingFrontCamera ? AVCaptureDevicePositionFront : AVCaptureDevicePositionBack;

  // Calculate the image orientation.
  UIDeviceOrientation deviceOrientation = [[UIDevice currentDevice] orientation];
  ImageOrientation orientation =
      [ImageUtility imageOrientationFromOrientation:deviceOrientation
                        withCaptureDevicePosition:devicePosition
                         defaultDeviceOrientation:[self deviceOrientationFromInterfaceOrientation]];
  // Invoke text detection.
  FIRVisionImage *image = [[FIRVisionImage alloc] initWithBuffer:sampleBuffer];
  FIRVisionImageMetadata *metadata = [[FIRVisionImageMetadata alloc] init];
  metadata.orientation = orientation;
  image.metadata = metadata;

  FIRVisionTextDetectionCallback callback =
      ^(NSArray<id<FIRVisionText>> *_Nullable features, NSError *_Nullable error) {
     ...
  };

 [self.textDetector detectInImage:image completion:callback];
}

上面使用的ImageUtility的辅助函数用于确定方向:
+ (FIRVisionDetectorImageOrientation)imageOrientationFromOrientation:(UIDeviceOrientation)deviceOrientation
                             withCaptureDevicePosition:(AVCaptureDevicePosition)position
                              defaultDeviceOrientation:(UIDeviceOrientation)defaultOrientation {
  if (deviceOrientation == UIDeviceOrientationFaceDown ||
      deviceOrientation == UIDeviceOrientationFaceUp ||
      deviceOrientation == UIDeviceOrientationUnknown) {
    deviceOrientation = defaultOrientation;
  }
  FIRVisionDetectorImageOrientation orientation = FIRVisionDetectorImageOrientationTopLeft;
  switch (deviceOrientation) {
    case UIDeviceOrientationPortrait:
      if (position == AVCaptureDevicePositionFront) {
        orientation = FIRVisionDetectorImageOrientationLeftTop;
      } else {
        orientation = FIRVisionDetectorImageOrientationRightTop;
      }
      break;
    case UIDeviceOrientationLandscapeLeft:
      if (position == AVCaptureDevicePositionFront) {
        orientation = FIRVisionDetectorImageOrientationBottomLeft;
      } else {
        orientation = FIRVisionDetectorImageOrientationTopLeft;
      }
      break;
    case UIDeviceOrientationPortraitUpsideDown:
      if (position == AVCaptureDevicePositionFront) {
        orientation = FIRVisionDetectorImageOrientationRightBottom;
      } else {
        orientation = FIRVisionDetectorImageOrientationLeftBottom;
      }
      break;
    case UIDeviceOrientationLandscapeRight:
      if (position == AVCaptureDevicePositionFront) {
        orientation = FIRVisionDetectorImageOrientationTopRight;
      } else {
        orientation = FIRVisionDetectorImageOrientationBottomRight;
      }
      break;
    default:
      orientation = FIRVisionDetectorImageOrientationTopLeft;
      break;
  }

  return orientation;
}

我真的无法解释为什么……因为我只是几天前安装了这些pods……今天我更新了一下,现在我的所有代码都可以正常工作了……不知道为什么! - BlackMirrorz
@JoshRobbins 很高兴听到这个消息。自从 I/O 发布以来,应该不会有任何变化。但这是一个好的惊喜 :) 谢谢分享! - Isabella Chen

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接