如何从AVCaptureSession生成的CMSampleBuffer中获取Y分量？

Question

如何从AVCaptureSession生成的CMSampleBuffer中获取Y分量？

iphonestreamavcapturesession

12

你好，我正在尝试使用AVCaptureSession访问iphone相机的原始数据。我遵循了Apple提供的指南(链接在这里)。

从samplebuffer中获得的原始数据采用YUV格式（我的理解是原始视频帧格式正确吗？），如何直接获取存储在samplebuffer中的原始数据的Y分量数据。

- Nihao

1

Brad Larson和Codo都在这个问题上给了我很大的帮助。结合他们的答案，我终于达到了我的目标。非常感谢你们，Brad Larson和Codo！ - Nihao

4个回答

19

除了Brad的回答和你自己的代码外，你还需要考虑以下内容：

由于您的图像有两个不同的平面，函数CVPixelBufferGetBaseAddress将不会返回平面的基地址，而是返回附加数据结构的基地址。您可能会得到一个足够靠近第一个平面的地址，以便看到图像，但这是因为当前实现的原因。但这也是它被移位并在左上角具有垃圾的原因。接收第一个平面的正确方法是：

unsigned char *rowBase = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);

图像中的一行可能会比图像的宽度更长（由于四舍五入）。这就是为什么有不同的函数用于获取宽度和每行字节数。目前您没有此问题。但是随着下一个iOS版本的到来，这可能会改变。因此，您的代码应该是：

int bufferHeight = CVPixelBufferGetHeight(pixelBuffer);
int bufferWidth = CVPixelBufferGetWidth(pixelBuffer);
int bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0);
int size = bufferHeight * bytesPerRow ;

unsigned char *pixel = (unsigned char*)malloc(size);

unsigned char *rowBase = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
memcpy (pixel, rowBase, size);

请注意，你的代码在 iPhone 3G 上将彻底失败。

- Codo

这不应该是CVPixelBufferGetHeightOfPlane吗？只是好奇。 - akaru

既然我们知道Y平面的像素数量与图像相同，那么在这里就不应该有任何区别。但是如果我们访问具有减少像素数量的UV平面，则必须使用_CVPixelBufferGetHeightOfPlane_。 - Codo

本文阐述了使用CVPixelBufferGetBaseAddress而不是CVPixelBufferGetBaseAddressOfPlane会导致什么样的错误。https://mkonrad.net/2014/06/24/cvvideocamera-vs-native-ios-camera-apis.html - zevarito

对于平面缓冲区，CVPixelBufferGetBaseAddress返回指向CVPlanarComponentInfo结构的指针，如果没有这样的结构，则返回NULL。因此，如果您的缓冲区是平面的，则必须使用CVPixelBufferGetBaseAddressOfPlane。 - papirosnik

8

如果您只需要亮度通道，则不建议使用BGRA格式，因为它会增加转换开销。苹果建议在进行渲染时使用BGRA，但提取亮度信息时不需要使用它。正如Brad所提到的那样，相机本地的YUV格式是最有效的格式。

然而，从样本缓冲区中提取正确的字节有点棘手，特别是对于具有交错YUV 422格式的iPhone 3G。因此，这是我的代码，它可以很好地与iPhone 3G、3GS、iPod Touch 4和iPhone 4S配合使用。

#pragma mark -
#pragma mark AVCaptureVideoDataOutputSampleBufferDelegate Methods
#if !(TARGET_IPHONE_SIMULATOR)
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection;
{
    // get image buffer reference
    CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);

    // extract needed informations from image buffer
    CVPixelBufferLockBaseAddress(imageBuffer, 0);
    size_t bufferSize = CVPixelBufferGetDataSize(imageBuffer);
    void *baseAddress = CVPixelBufferGetBaseAddress(imageBuffer);
    CGSize resolution = CGSizeMake(CVPixelBufferGetWidth(imageBuffer), CVPixelBufferGetHeight(imageBuffer));

    // variables for grayscaleBuffer 
    void *grayscaleBuffer = 0;
    size_t grayscaleBufferSize = 0;

    // the pixelFormat differs between iPhone 3G and later models
    OSType pixelFormat = CVPixelBufferGetPixelFormatType(imageBuffer);

    if (pixelFormat == '2vuy') { // iPhone 3G
        // kCVPixelFormatType_422YpCbCr8     = '2vuy',    
        /* Component Y'CbCr 8-bit 4:2:2, ordered Cb Y'0 Cr Y'1 */

        // copy every second byte (luminance bytes form Y-channel) to new buffer
        grayscaleBufferSize = bufferSize/2;
        grayscaleBuffer = malloc(grayscaleBufferSize);
        if (grayscaleBuffer == NULL) {
            NSLog(@"ERROR in %@:%@:%d: couldn't allocate memory for grayscaleBuffer!", NSStringFromClass([self class]), NSStringFromSelector(_cmd), __LINE__);
            return nil; }
        memset(grayscaleBuffer, 0, grayscaleBufferSize);
        void *sourceMemPos = baseAddress + 1;
        void *destinationMemPos = grayscaleBuffer;
        void *destinationEnd = grayscaleBuffer + grayscaleBufferSize;
        while (destinationMemPos <= destinationEnd) {
            memcpy(destinationMemPos, sourceMemPos, 1);
            destinationMemPos += 1;
            sourceMemPos += 2;
        }       
    }

    if (pixelFormat == '420v' || pixelFormat == '420f') {
        // kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange = '420v', 
        // kCVPixelFormatType_420YpCbCr8BiPlanarFullRange  = '420f',
        // Bi-Planar Component Y'CbCr 8-bit 4:2:0, video-range (luma=[16,235] chroma=[16,240]).  
        // Bi-Planar Component Y'CbCr 8-bit 4:2:0, full-range (luma=[0,255] chroma=[1,255]).
        // baseAddress points to a big-endian CVPlanarPixelBufferInfo_YCbCrBiPlanar struct
        // i.e.: Y-channel in this format is in the first third of the buffer!
        int bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0);
        baseAddress = CVPixelBufferGetBaseAddressOfPlane(imageBuffer,0);
        grayscaleBufferSize = resolution.height * bytesPerRow ;
        grayscaleBuffer = malloc(grayscaleBufferSize);
        if (grayscaleBuffer == NULL) {
            NSLog(@"ERROR in %@:%@:%d: couldn't allocate memory for grayscaleBuffer!", NSStringFromClass([self class]), NSStringFromSelector(_cmd), __LINE__);
            return nil; }
        memset(grayscaleBuffer, 0, grayscaleBufferSize);
        memcpy (grayscaleBuffer, baseAddress, grayscaleBufferSize); 
    }

    // do whatever you want with the grayscale buffer
    ...

    // clean-up
    free(grayscaleBuffer);
}
#endif

- Tafkadasoh

你好，谢谢你的答案，我面临着同样的问题。有一件事是我也想要Cr和Cb成分，但我不确定如何得到它们。我正试图制作一个皮肤检测器，我也需要这些值，因为我在SO的另一篇帖子中发现了这些值。我已经使用BGRA格式完成了这个过程并将其转换为YCbCr，但如果可能的话，我想避免进行转换步骤以提高FPS。这就是为什么我想为图像中的每个像素获取单独的Y Cb和Cr值。有什么想法吗？ - George

你是如何确定组件信号的字节顺序的？我从微软找到的文档中将其列为Y0CrY1Cb。 - Pescolly

我在一个苹果的头文件中找到了一个提示。很抱歉，我无法告诉您它是哪个头文件。 - Tafkadasoh

4

这只是其他人辛勤工作的集大成者，包括其他帖子上的内容，现已转换为Swift 3版本，供需要的人使用。

func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
    if let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
        CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags.readOnly)

        let pixelFormatType = CVPixelBufferGetPixelFormatType(pixelBuffer)
        if pixelFormatType == kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
           || pixelFormatType == kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange {

            let bufferHeight = CVPixelBufferGetHeight(pixelBuffer)
            let bufferWidth = CVPixelBufferGetWidth(pixelBuffer)

            let lumaBytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
            let size = bufferHeight * lumaBytesPerRow
            let lumaBaseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
            let lumaByteBuffer = unsafeBitCast(lumaBaseAddress, to:UnsafeMutablePointer<UInt8>.self)

            let releaseDataCallback: CGDataProviderReleaseDataCallback = { (info: UnsafeMutableRawPointer?, data: UnsafeRawPointer, size: Int) -> () in
                // https://developer.apple.com/reference/coregraphics/cgdataproviderreleasedatacallback
                // N.B. 'CGDataProviderRelease' is unavailable: Core Foundation objects are automatically memory managed
                return
            }

            if let dataProvider = CGDataProvider(dataInfo: nil, data: lumaByteBuffer, size: size, releaseData: releaseDataCallback) {
                let colorSpace = CGColorSpaceCreateDeviceGray()
                let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.noneSkipFirst.rawValue)

                let cgImage = CGImage(width: bufferWidth, height: bufferHeight, bitsPerComponent: 8, bitsPerPixel: 8, bytesPerRow: lumaBytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo, provider: dataProvider, decode: nil, shouldInterpolate: false, intent: CGColorRenderingIntent.defaultIntent)

                let greyscaleImage = UIImage(cgImage: cgImage!)
                // do what you want with the greyscale image.
            }
        }

        CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags.readOnly)
    }
}

- Awesomeness

如果上面的解决方案对某人无效，请尝试使用***let bitmapInfo = CGBitmapInfo(rawValue: CGImageByteOrderInfo.orderDefault.rawValue)**而不是let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.noneSkipFirst.rawValue)*。 - Harman

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Brad Larson · Accepted Answer

在设置返回原始相机帧的AVCaptureVideoDataOutput时，您可以使用以下代码设置帧的格式：

[videoOutput setVideoSettings:[NSDictionary dictionaryWithObject:[NSNumber numberWithInt:kCVPixelFormatType_32BGRA] forKey:(id)kCVPixelBufferPixelFormatTypeKey]];

在这种情况下，指定了BGRA像素格式（我用它来匹配OpenGL ES纹理的颜色格式）。该格式中的每个像素按顺序具有一个字节的蓝、绿、红和alpha。选择这种方式可以轻松提取颜色分量，但需要从相机本地YUV色彩空间进行转换，因此会牺牲一些性能。

其他支持的色彩空间是新设备上的“kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange”和“kCVPixelFormatType_420YpCbCr8BiPlanarFullRange”，以及iPhone 3G上的“kCVPixelFormatType_422YpCbCr8”。 “VideoRange”或“FullRange”后缀仅表示是否返回介于16-235之间的字节用于Y，并且介于16-240之间的字节用于UV，或者对于每个分量都使用完整的0-255。

我认为AVCaptureVideoDataOutput实例使用的默认色彩空间是YUV 4:2:0平面色彩空间（除了iPhone 3G，那里是YUV 4:2:2交错）。这意味着视频帧中包含两个图像数据平面，其中Y平面首先出现。对于您生成的每个像素，都有一个字节用于该像素处的Y值。

您可以通过在委托回调中实现以下内容来获取此原始Y数据：

- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
    CVImageBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
    CVPixelBufferLockBaseAddress(pixelBuffer, 0);

    unsigned char *rawPixelBase = (unsigned char *)CVPixelBufferGetBaseAddress(pixelBuffer);

    // Do something with the raw pixels here

    CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
}

你可以通过找出图像上每个X，Y坐标的帧数据中的位置，并提取对应于该坐标处Y分量的字节来解决问题。

苹果的FindMyiCone示例（可在WWDC 2010视频一起访问）展示了如何处理每个帧的原始BGRA数据。我还创建了一个样例应用程序，你可以从这里下载代码，使用iPhone相机的实时视频执行基于颜色的物体跟踪。两者都展示了如何处理原始像素数据，但这两个示例都没有在YUV颜色空间下工作。