从CMBlockBuffer中提取h264

21

我正在使用苹果Video Toolbox框架来压缩由设备摄像头捕获的原始帧。

我的回调函数被一个包含CMBlockBufferCMSampleBufferRef对象调用。

CMBlockBuffer对象包含H264基本流,但我没有找到任何获取指向基本流的指针的方法。

当我将CMSampleBufferRef对象打印到控制台时,得到了:

(lldb) po blockBufferRef
CMBlockBuffer 0x1701193e0 totalDataLength: 4264 retainCount: 1 allocator: 0x1957c2c80 subBlockCapacity: 2
[0] 4264 bytes @ offset 128 Buffer Reference:
CMBlockBuffer 0x170119350 totalDataLength: 4632 retainCount: 1 allocator: 0x1957c2c80 subBlockCapacity: 2
[0] 4632 bytes @ offset 0 Memory Block 0x10295c000, 4632 bytes (custom V=0 A=0x0 F=0x18498bb44 R=0x0)
似乎我成功获取指向的CMBlockBuffer对象包含另一个不可访问的CMBlockBuferRef4632字节)。

有人可以发帖说明如何访问H264元素流吗?

谢谢!

3个回答

57

我自己也曾经为此苦恼了相当一段时间,现在终于想通了。

CMBlockBufferGetDataPointer函数可以让你获得所有所需的数据,但是要将它转换成元素流需要做一些不太明显的事情。

AVCC与Annex B格式

CMBlockBuffer中的数据以AVCC格式存储,而元素流通常遵循Annex B规范(这里有两种格式的出色概述)。在AVCC格式中,前4个字节包含NAL单元的长度(H264数据包的另一个词),您需要用4字节的开始码0x00 0x00 0x00 0x01替换这个头部。在Annex B元素流中,它作为分隔符分隔NAL单元(3字节版本0x00 0x00 0x01也可以使用)。

单个CMBlockBuffer中的多个NAL单元

下一个不太明显的问题是单个CMBlockBuffer有时会包含多个NAL单元。Apple似乎会向每个I-Frame NAL单元(也称为IDR)添加包含metadata的额外NAL单元(SEI)。这可能就是您在单个CMBlockBuffer对象中看到多个缓冲区的原因。然而,CMBlockBufferGetDataPointer函数会给您提供一个访问所有数据的单一指针。话虽如此,多个NAL单元的出现使AVCC头部的转换变得复杂。现在你实际上必须读取AVCC头部中包含的长度值来查找下一个NAL单元,并继续转换头部直到达到缓冲区的末尾。

大端与小端

下一个不太明显的问题是,AVCC头部以Big-Endian格式存储,而iOS本地是Little-Endian。因此,在读取AVCC头部中包含的长度值时,请先将其传递给CFSwapInt32BigToHost函数。

SPS和PPS NAL单元

最后一个不太明显的事情是,CMBlockBuffer中的数据不包含参数NAL单元SPS和PPS,它们包含解码器的配置参数,例如档次、级别、分辨率、帧率等等。这些作为样本缓冲器格式描述中的metadata存储,并可以通过函数CMVideoFormatDescriptionGetH264ParameterSetAtIndex访问。请注意,在发送之前,必须向这些NAL单元添加开始码。SPS和PPS NAL单元不必在每个新帧中都发送。解码器只需要读取它们一次,但通常会定期重新发送它们,例如在每个新的I-Frame NAL单元之前。

代码示例

下面是一个考虑到所有这些问题的代码示例。

static void videoFrameFinishedEncoding(void *outputCallbackRefCon,
                                       void *sourceFrameRefCon,
                                       OSStatus status,
                                       VTEncodeInfoFlags infoFlags,
                                       CMSampleBufferRef sampleBuffer) {
    // Check if there were any errors encoding
    if (status != noErr) {
        NSLog(@"Error encoding video, err=%lld", (int64_t)status);
        return;
    }

    // In this example we will use a NSMutableData object to store the
    // elementary stream.
    NSMutableData *elementaryStream = [NSMutableData data];


    // Find out if the sample buffer contains an I-Frame.
    // If so we will write the SPS and PPS NAL units to the elementary stream.
    BOOL isIFrame = NO;
    CFArrayRef attachmentsArray = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, 0);
    if (CFArrayGetCount(attachmentsArray)) {
        CFBooleanRef notSync;
        CFDictionaryRef dict = CFArrayGetValueAtIndex(attachmentsArray, 0);
        BOOL keyExists = CFDictionaryGetValueIfPresent(dict,
                                                       kCMSampleAttachmentKey_NotSync,
                                                       (const void **)&notSync);
        // An I-Frame is a sync frame
        isIFrame = !keyExists || !CFBooleanGetValue(notSync);
    }

    // This is the start code that we will write to
    // the elementary stream before every NAL unit
    static const size_t startCodeLength = 4;
    static const uint8_t startCode[] = {0x00, 0x00, 0x00, 0x01};

    // Write the SPS and PPS NAL units to the elementary stream before every I-Frame
    if (isIFrame) {
        CMFormatDescriptionRef description = CMSampleBufferGetFormatDescription(sampleBuffer);

        // Find out how many parameter sets there are
        size_t numberOfParameterSets;
        CMVideoFormatDescriptionGetH264ParameterSetAtIndex(description,
                                                           0, NULL, NULL,
                                                           &numberOfParameterSets,
                                                           NULL);

        // Write each parameter set to the elementary stream
        for (int i = 0; i < numberOfParameterSets; i++) {
            const uint8_t *parameterSetPointer;
            size_t parameterSetLength;
            CMVideoFormatDescriptionGetH264ParameterSetAtIndex(description,
                                                               i,
                                                               &parameterSetPointer,
                                                               &parameterSetLength,
                                                               NULL, NULL);

            // Write the parameter set to the elementary stream
            [elementaryStream appendBytes:startCode length:startCodeLength];
            [elementaryStream appendBytes:parameterSetPointer length:parameterSetLength];
        }
    }

    // Get a pointer to the raw AVCC NAL unit data in the sample buffer
    size_t blockBufferLength;
    uint8_t *bufferDataPointer = NULL;
    CMBlockBufferGetDataPointer(CMSampleBufferGetDataBuffer(sampleBuffer),
                                0,
                                NULL,
                                &blockBufferLength,
                                (char **)&bufferDataPointer);

    // Loop through all the NAL units in the block buffer
    // and write them to the elementary stream with
    // start codes instead of AVCC length headers
    size_t bufferOffset = 0;
    static const int AVCCHeaderLength = 4;
    while (bufferOffset < blockBufferLength - AVCCHeaderLength) {
        // Read the NAL unit length
        uint32_t NALUnitLength = 0;
        memcpy(&NALUnitLength, bufferDataPointer + bufferOffset, AVCCHeaderLength);
        // Convert the length value from Big-endian to Little-endian
        NALUnitLength = CFSwapInt32BigToHost(NALUnitLength);
        // Write start code to the elementary stream
        [elementaryStream appendBytes:startCode length:startCodeLength];
        // Write the NAL unit without the AVCC length header to the elementary stream
        [elementaryStream appendBytes:bufferDataPointer + bufferOffset + AVCCHeaderLength
                               length:NALUnitLength];
        // Move to the next NAL unit in the block buffer
        bufferOffset += AVCCHeaderLength + NALUnitLength;
    }
}   

1
嗨Anton,你是正确的!我在发布问题后一天自己解决了它。我的主要问题是理解nalu大小是以big endian存储的,但是在我观察内存缓冲区的内容之后,我弄清楚了,并且能够解析和复制ES到我的缓冲区。我必须补充的一件事是AUD。在某些情况下,我们将ES与音频流多路复用到mpeg-ts或任何其他需要AUD的容器中。我在每个NALU之前添加了AUD,现在我的解码器能够解码流了。 - koby
快速问题: 您提到NSMutableData不是保存ES数据的好方法。为什么?您推荐使用什么替代品? - dcheng
NSMutableData的一个问题是,随着字节的追加,它会在运行时不断分配空间。这是不必要的工作。可以通过使用[NSMutableData dataWithCapacity:]为数据预先分配空间来避免这种情况。另一个选择是将缓冲区作为实例变量,并在每次调用videoFrameFinishedEncoding时重复使用它。但所有这些建议都是微小的优化,可能不会对性能产生太大影响。 - Anton Holmberg
为什么苹果选择在CMBlockBuffer中将一个帧分成多个NALU?我能否将这些NALU合并成一个NALU? - ideawu
嗯...我正在尝试这个,而且Video Toolbox现在以小端方式给出长度值。还有人遇到这个问题吗?顺便说一下,我是在iPhone X上运行它。 - Xavier L.
我可以直接将这个 elementaryStream 写入文件吗?它能被播放吗? - prabhu

7
感谢Anton提供的出色答案!我将为那些希望在Swift项目中直接使用此处讨论的概念的人提供一个简单的Swift版本。请参考以下内容:

感谢Anton提供的出色答案!我将为那些希望在Swift项目中直接使用此处讨论的概念的人提供一个简单的Swift版本。

public func didEncodeFrame(frame: CMSampleBuffer)
{
    print ("Received encoded frame in delegate...")

    //----AVCC to Elem stream-----//
    var elementaryStream = NSMutableData()

    //1. check if CMBuffer had I-frame
    var isIFrame:Bool = false
    let attachmentsArray:CFArray = CMSampleBufferGetSampleAttachmentsArray(frame, false)!
    //check how many attachments
    if ( CFArrayGetCount(attachmentsArray) > 0 ) {
        let dict = CFArrayGetValueAtIndex(attachmentsArray, 0)
        let dictRef:CFDictionaryRef = unsafeBitCast(dict, CFDictionaryRef.self)
        //get value
        let value = CFDictionaryGetValue(dictRef, unsafeBitCast(kCMSampleAttachmentKey_NotSync, UnsafePointer<Void>.self))
        if ( value != nil ){
            print ("IFrame found...")
            isIFrame = true
        }
    }

    //2. define the start code
    let nStartCodeLength:size_t = 4
    let nStartCode:[UInt8] = [0x00, 0x00, 0x00, 0x01]

    //3. write the SPS and PPS before I-frame
    if ( isIFrame == true ){
        let description:CMFormatDescriptionRef = CMSampleBufferGetFormatDescription(frame)!
        //how many params
        var numParams:size_t = 0
        CMVideoFormatDescriptionGetH264ParameterSetAtIndex(description, 0, nil, nil, &numParams, nil)

        //write each param-set to elementary stream
        print("Write param to elementaryStream ", numParams)
        for i in 0..<numParams {
            var parameterSetPointer:UnsafePointer<UInt8> = nil
            var parameterSetLength:size_t = 0
            CMVideoFormatDescriptionGetH264ParameterSetAtIndex(description, i, &parameterSetPointer, &parameterSetLength, nil, nil)
            elementaryStream.appendBytes(nStartCode, length: nStartCodeLength)
            elementaryStream.appendBytes(parameterSetPointer, length: unsafeBitCast(parameterSetLength, Int.self))
        }
    }

    //4. Get a pointer to the raw AVCC NAL unit data in the sample buffer
    var blockBufferLength:size_t = 0
    var bufferDataPointer: UnsafeMutablePointer<Int8> = nil
    CMBlockBufferGetDataPointer(CMSampleBufferGetDataBuffer(frame)!, 0, nil, &blockBufferLength, &bufferDataPointer)
    print ("Block length = ", blockBufferLength)

    //5. Loop through all the NAL units in the block buffer
    var bufferOffset:size_t = 0
    let AVCCHeaderLength:Int = 4
    while (bufferOffset < (blockBufferLength - AVCCHeaderLength) ) {
        // Read the NAL unit length
        var NALUnitLength:UInt32 =  0
        memcpy(&NALUnitLength, bufferDataPointer + bufferOffset, AVCCHeaderLength)
        //Big-Endian to Little-Endian
        NALUnitLength = CFSwapInt32(NALUnitLength)
        if ( NALUnitLength > 0 ){
            print ( "NALUnitLen = ", NALUnitLength)
            // Write start code to the elementary stream
            elementaryStream.appendBytes(nStartCode, length: nStartCodeLength)
            // Write the NAL unit without the AVCC length header to the elementary stream
            elementaryStream.appendBytes(bufferDataPointer + bufferOffset + AVCCHeaderLength, length: Int(NALUnitLength))
            // Move to the next NAL unit in the block buffer
            bufferOffset += AVCCHeaderLength + size_t(NALUnitLength);
            print("Moving to next NALU...")
        }
    }
    print("Read completed...")
}

2
@shary,你能把解码部分也发一下吗? - Sreejith S
我正在使用Swift3编写此代码,但无法获取iFrame。让我看看能否帮助你解决这个问题?谢谢。 - Ashish
给其他考古学家们的提示:现在 CFDictionaryRef 被称为 CFDictionary,而 UnsafePointer<Void> 则被称为 UnsafeRawPointer。你还需要在 unsafeBitCast 调用的第二个参数上添加一个 to: 标签。 - sinewave440hz

0
感谢到目前为止所有出色的回答。这是我根据其他答案编写的另一个更清晰、更现代的 Swift 实现。我还利用了起始代码与 AVCC 32 位长度值相同的大小,这使得您可以进行一次 memcpy,然后在原地用起始代码覆盖长度。

https://github.com/foxglove/foxglove-ios-bridge/blob/5aa3f9822fcd2fa9590a4a7232439ad044cbd831/WebSocketDemo-Shared/CameraManager.swift#L336-L406

enum VideoError: Error {
  case failedToGetParameterSetCount
  case failedToGetParameterSet(index: Int)
}

extension CMSampleBuffer {
  /// Convert a CMSampleBuffer holding a CMBlockBuffer in AVCC format into Annex B format.
  func dataBufferAsAnnexB() -> Data? {
    guard let dataBuffer, let formatDescription else {
      return nil
    }

    do {
      var result = Data()
      let startCode = Data([0x00, 0x00, 0x00, 0x01])
      
      try formatDescription.forEachParameterSet { buf in
        result.append(startCode)
        result.append(buf)
      }
      
      try dataBuffer.withContiguousStorage { rawBuffer in
        // Since the startCode is 4 bytes, we can append the whole AVCC buffer to the output,
        // and then replace the 4-byte length values with start codes.
        var offset = result.count
        result.append(rawBuffer.assumingMemoryBound(to: UInt8.self))
        result.withUnsafeMutableBytes { resultBuffer in
          while offset + 4 < resultBuffer.count {
            let nalUnitLength = Int(UInt32(bigEndian: resultBuffer.loadUnaligned(fromByteOffset: offset, as: UInt32.self)))
            resultBuffer[offset..<offset+4].copyBytes(from: startCode)
            offset += 4 + nalUnitLength
          }
        }
      }
      
      return result
    } catch let err {
      print("Error converting to Annex B: \(err)")
      return nil
    }
  }
}

extension CMFormatDescription {
  func forEachParameterSet(_ callback: (UnsafeBufferPointer<UInt8>) -> Void) throws {
    var parameterSetCount = 0
    var status = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(
      self,
      parameterSetIndex: 0,
      parameterSetPointerOut: nil,
      parameterSetSizeOut: nil,
      parameterSetCountOut: &parameterSetCount,
      nalUnitHeaderLengthOut: nil
    )
    guard noErr == status else {
      throw VideoError.failedToGetParameterSetCount
    }
    
    for idx in 0..<parameterSetCount {
      var ptr: UnsafePointer<UInt8>? = nil
      var size = 0
      status = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(
        self,
        parameterSetIndex: idx,
        parameterSetPointerOut: &ptr,
        parameterSetSizeOut: &size,
        parameterSetCountOut: nil,
        nalUnitHeaderLengthOut: nil
      )
      guard noErr == status else {
        throw VideoError.failedToGetParameterSet(index: idx)
      }
      callback(UnsafeBufferPointer(start: ptr, count: size))
    }
  }
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接