FFmpeg解码H264的延迟问题。

Question

FFmpeg解码H264的延迟问题。

7

我正在处理原始的RGB帧，将它们编码成h264格式，然后再将其解码回到原始的RGB帧。

[RGB frame] ------ encoder ------> [h264 stream] ------ decoder ------> [RGB frame]
              ^               ^                    ^               ^
        encoder_write    encoder_read        decoder_write    decoder_read

我希望尽快获取解码帧，但无论等待多久，似乎总是有一帧的延迟。¹在这个例子中，我每2秒送入一个帧给编码器：

$ python demo.py 2>/dev/null
time=0 frames=1 encoder_write
time=2 frames=2 encoder_write
time=2 frames=1 decoder_read   <-- decoded output is delayed by extra frame
time=4 frames=3 encoder_write
time=4 frames=2 decoder_read
time=6 frames=4 encoder_write
time=6 frames=3 decoder_read
...

What I want instead:

$ python demo.py 2>/dev/null
time=0 frames=1 encoder_write
time=0 frames=1 decoder_read   <-- decode immediately after encode
time=2 frames=2 encoder_write
time=2 frames=2 decoder_read
time=4 frames=3 encoder_write
time=4 frames=3 decoder_read
time=6 frames=4 encoder_write
time=6 frames=4 decoder_read
...

编码器和解码器ffmpeg进程使用以下参数运行：

encoder: ffmpeg -f rawvideo -pix_fmt rgb24 -s 224x224 -i pipe: \
                -f h264 -tune zerolatency pipe:

decoder: ffmpeg -probesize 32 -flags low_delay \
                -f h264 -i pipe: \
                -f rawvideo -pix_fmt rgb24 -s 224x224 pipe:

以下是完整的可再现示例。不需要外部视频文件。只需复制、粘贴并运行python demo.py 2>/dev/null！

import subprocess
from queue import Queue
from threading import Thread
from time import sleep, time
import numpy as np

WIDTH = 224
HEIGHT = 224
NUM_FRAMES = 256

def t(epoch=time()):
    return int(time() - epoch)

def make_frames(num_frames):
    x = np.arange(WIDTH, dtype=np.uint8)
    x = np.broadcast_to(x, (num_frames, HEIGHT, WIDTH))
    x = x[..., np.newaxis].repeat(3, axis=-1)
    x[..., 1] = x[:, :, ::-1, 1]
    scale = np.arange(1, len(x) + 1, dtype=np.uint8)
    scale = scale[:, np.newaxis, np.newaxis, np.newaxis]
    x *= scale
    return x

def encoder_write(writer):
    """Feeds encoder frames to encode"""
    frames = make_frames(num_frames=NUM_FRAMES)
    for i, frame in enumerate(frames):
        writer.write(frame.tobytes())
        writer.flush()
        print(f"time={t()} frames={i + 1:<3} encoder_write")
        sleep(2)
    writer.close()

def encoder_read(reader, queue):
    """Puts chunks of encoded bytes into queue"""
    while chunk := reader.read1():
        queue.put(chunk)
        # print(f"time={t()} chunk={len(chunk):<4} encoder_read")
    queue.put(None)

def decoder_write(writer, queue):
    """Feeds decoder bytes to decode"""
    while chunk := queue.get():
        writer.write(chunk)
        writer.flush()
        # print(f"time={t()} chunk={len(chunk):<4} decoder_write")
    writer.close()

def decoder_read(reader):
    """Retrieves decoded frames"""
    buffer = b""
    frame_len = HEIGHT * WIDTH * 3
    targets = make_frames(num_frames=NUM_FRAMES)
    i = 0
    while chunk := reader.read1():
        buffer += chunk
        while len(buffer) >= frame_len:
            frame = np.frombuffer(buffer[:frame_len], dtype=np.uint8)
            frame = frame.reshape(HEIGHT, WIDTH, 3)
            psnr = 10 * np.log10(255**2 / np.mean((frame - targets[i])**2))
            buffer = buffer[frame_len:]
            i += 1
            print(f"time={t()} frames={i:<3} decoder_read  psnr={psnr:.1f}")

cmd = (
    "ffmpeg "
    "-f rawvideo -pix_fmt rgb24 -s 224x224 "
    "-i pipe: "
    "-f h264 "
    "-tune zerolatency "
    "pipe:"
)
encoder_process = subprocess.Popen(
    cmd.split(), stdin=subprocess.PIPE, stdout=subprocess.PIPE
)

cmd = (
    "ffmpeg "
    "-probesize 32 "
    "-flags low_delay "
    "-f h264 "
    "-i pipe: "
    "-f rawvideo -pix_fmt rgb24 -s 224x224 "
    "pipe:"
)
decoder_process = subprocess.Popen(
    cmd.split(), stdin=subprocess.PIPE, stdout=subprocess.PIPE
)

queue = Queue()

threads = [
    Thread(target=encoder_write, args=(encoder_process.stdin,),),
    Thread(target=encoder_read, args=(encoder_process.stdout, queue),),
    Thread(target=decoder_write, args=(decoder_process.stdin, queue),),
    Thread(target=decoder_read, args=(decoder_process.stdout,),),
]

for thread in threads:
    thread.start()

¹ 我进行了一些测试，发现解码器在解码当前帧之前会等待下一帧的NAL头部00 00 00 01 41 88(十六进制)。人们希望前缀00 00 00 01就足够了，但它还需要等待接下来的两个字节！

² ~~问题的先前版本。~~

- Mateen Ulhaq

我理解 h264 编码试图编码帧之间的差异以节省带宽，所以它肯定要等到第二帧才能得到差异吧？ - Mark Setchell

@MarkSetchell 只有在当前帧是B帧时才需要等待，因为需要来自未来帧的信息。 - Mateen Ulhaq

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rotem · Accepted Answer

将-probesize 32添加到您的解码器参数中。

Set decoder command to:

cmd = "ffmpeg -probesize 32 -f h264 -i pipe: -f rawvideo -pix_fmt rgb24 -s 224x224 pipe:"

我在这里找到了解决方案：如何使用FFmpeg最小化直播流的延迟。

根据FFmpeg的StreamingGuide：

引用：设置-probesize和-analyzeduration为较低的值也可以帮助您的流更快地启动。

添加-probesize 32参数后，我得到了9行“Decoder written 862 bytes”而不是大约120行。

更新：我找不到解决方案，但我成功地形成了一个简单的问题演示。

代码示例使用一个子进程和没有Python线程，而不是使用两个子进程和4个线程。

该示例使用以下“过滤器图”：

 _________              ______________            _________
| BMP     |            |              |          | BMP     |
| encoded |  demuxer   | encoded data |  muxer   | encoded |
| frames  | ---------> | packets      | -------> | frames  |
|_________|            |______________|          |_________|
input PIPE                                       output PIPE

请参阅：流复制章节。

我发现，为了将输入的第一帧“推送”到输出中，我们需要从第二帧的开头至少写入额外的4112字节。

以下是代码示例：

import cv2
import numpy as np
import subprocess as sp

width, height, n_frames, fps = 256, 256, 10, 1  # 10 frames, resolution 256x256, and 1 fps


def make_bmp_frame_as_bytes(i):
    """ Build synthetic image for testing, encode as BMP and convert to bytes sequence """
    p = width//50
    img = np.full((height, width, 3), 60, np.uint8)
    cv2.putText(img, str(i+1), (width//2-p*10*len(str(i+1)), height//2+p*10), cv2.FONT_HERSHEY_DUPLEX, p, (255, 30, 30), p*2)  # Blue number

    # BMP Encode img into bmp_img
    _, bmp_img = cv2.imencode(".BMP", img)
    bmp_img_bytes = bmp_img.tobytes()

    return bmp_img_bytes



# BMP in, BMP out:
process = sp.Popen(f'ffmpeg -debug_ts -probesize 32 -f bmp_pipe -framerate {fps} -an -sn -dn -i pipe: -f image2pipe -codec copy -an -sn -dn pipe:', stdin=sp.PIPE, stdout=sp.PIPE)

# Build image (number -1) before the loop.
bmp_img_bytes = make_bmp_frame_as_bytes(-1)

# Write one BMP encoded image before the loop.
process.stdin.write(bmp_img_bytes)
process.stdin.flush()

for i in range(n_frames):
    # Build image (number i) before the loop.
    bmp_img_bytes = make_bmp_frame_as_bytes(i)

    # Write 4112 first bytes of the BMP encoded image.
    # Writing 4112 "push" forward the previous image (writing less than 4112 bytes hals on the first frame).
    process.stdin.write(bmp_img_bytes[0:4112])
    process.stdin.flush()

    # Read output BMP encoded image from stdout PIPE.
    buffer = process.stdout.read(width*height*3 + 54)   # BMP header is 54 bytes
    buffer = np.frombuffer(buffer, np.uint8)
    frame = cv2.imdecode(buffer, cv2.IMREAD_COLOR)  # Decode BMP image (using OpenCV).

    # Display the image
    cv2.imshow('frame', frame)
    cv2.waitKey(1000)

    # Write the next bytes of the BMP encoded image (from byte 4112 to the end).
    process.stdin.write(bmp_img_bytes[4112:])
    process.stdin.flush()


process.stdin.close()
buffer = process.stdout.read(width*height*3 + 54)   # Read last image
process.stdout.close()

# Wait for sub-process to finish
process.wait()

cv2.destroyAllWindows()

我不知道为什么是4112字节。
我使用的是静态链接（ffmpeg.exe）的Windows 10下的FFmpeg版本4.2.2。
我没有检查其他版本/平台是否也是4112字节。
我怀疑“延迟问题”是固有的FFmpeg解复用器的问题。
我找不到任何参数/标志来防止这个问题。
原始视频解复用器是唯一一个没有增加延迟的解复用器（我找到的）。

我希望这个简单的示例代码能帮助找到解决延迟问题的方法...

更新:

H.264流示例：

此示例使用以下“过滤器图形”：

 _________              ______________              _________ 
| H.264   |            |              |            |         |
| encoded |  demuxer   | encoded data |  decoder   | decoded |
| frames  | ---------> | packets      | ---------> | frames  |
|_________|            |______________|            |_________|
input PIPE                                         output PIPE

代码示例在每个编码帧编写后写入AUD NAL单元。

AUD（Access Unit Delimiter）是可选的NAL单元，位于编码帧开头。
显然，在编写编码帧之后写入AUD会将编码帧从分离器“推送”到解码器。

以下是代码示例：

import cv2
import numpy as np
import subprocess as sp
import json

width, height, n_frames, fps = 256, 256, 100, 1  # 100 frames, resolution 256x256, and 1 fps


def make_raw_frame_as_bytes(i):
    """ Build synthetic "raw BGR" image for testing, convert the image to bytes sequence """
    p = width//60
    img = np.full((height, width, 3), 60, np.uint8)
    cv2.putText(img, str(i+1), (width//2-p*10*len(str(i+1)), height//2+p*10), cv2.FONT_HERSHEY_DUPLEX, p, (255, 30, 30), p*2)  # Blue number

    raw_img_bytes = img.tobytes()

    return raw_img_bytes


# Build input file input.264 (AVC encoded elementary stream)
################################################################################
process = sp.Popen(f'ffmpeg -y -video_size {width}x{height} -pixel_format bgr24 -f rawvideo -r {fps} -an -sn -dn -i pipe: -f h264 -g 1 -pix_fmt yuv444p -crf 10 -tune zerolatency -an -sn -dn input.264', stdin=sp.PIPE)

#-x264-params aud=1
#Adds [  0,   0,   0,   1,   9,  16 ] to the beginning of each encoded frame
aud_bytes = b'\x00\x00\x00\x01\t\x10'  #Access Unit Delimiter
#process = sp.Popen(f'ffmpeg -y -video_size {width}x{height} -pixel_format bgr24 -f rawvideo -r {fps} -an -sn -dn -i pipe: -f h264 -g 1 -pix_fmt yuv444p -crf 10 -tune zerolatency -x264-params aud=1 -an -sn -dn input.264', stdin=sp.PIPE)

for i in range(n_frames):
    raw_img_bytes = make_raw_frame_as_bytes(i)
    process.stdin.write(raw_img_bytes) # Write raw video frame to input stream of ffmpeg sub-process.

process.stdin.close()
process.wait()
################################################################################

# Execute FFprobe and create JSON file (showing pkt_pos and pkt_size for every encoded frame):
sp.run('ffprobe -print_format json -show_frames input.264', stdout=open('input_probe.json', 'w'))

# Read FFprobe output to dictionary p
with open('input_probe.json') as f:
    p = json.load(f)['frames']


# Input PIPE: H.264 encoded video, output PIPE: decoded video frames in raw BGR video format
process = sp.Popen(f'ffmpeg -probesize 32 -flags low_delay -f h264 -framerate {fps} -an -sn -dn -i pipe: -f rawvideo -s {width}x{height} -pix_fmt bgr24 -an -sn -dn pipe:', stdin=sp.PIPE, stdout=sp.PIPE)

f = open('input.264', 'rb')

process.stdin.write(aud_bytes)  # Write AUD NAL unit before the first encoded frame.

for i in range(n_frames-1):
    # Read H.264 encoded video frame
    h264_frame_bytes = f.read(int(p[i]['pkt_size']))

    process.stdin.write(h264_frame_bytes)
    process.stdin.write(aud_bytes)  # Write AUD NAL unit after the encoded frame.
    process.stdin.flush()

    # Read decoded video frame (in raw video format) from stdout PIPE.
    buffer = process.stdout.read(width*height*3)
    frame = np.frombuffer(buffer, np.uint8).reshape(height, width, 3)

    # Display the decoded video frame
    cv2.imshow('frame', frame)
    cv2.waitKey(1)

# Write last encoded frame
h264_frame_bytes = f.read(int(p[n_frames-1]['pkt_size']))
process.stdin.write(h264_frame_bytes)

f.close()


process.stdin.close()
buffer = process.stdout.read(width*height*3)   # Read the last video frame
process.stdout.close()

# Wait for sub-process to finish
process.wait()

cv2.destroyAllWindows()

更新：

额外帧延迟的原因是 H264 素材流没有“结束帧”信号，也没有NAL 单元头部的“有效载荷大小”字段。

唯一检测帧结束的方法是看下一帧从哪里开始。

参见：如何检测 H.264 视频流中的帧结束和如何知道代表图片的 H.264 流中 NAL 单元的数量。

为了避免等待下一帧的开始，必须使用“传输流”层或视频容器格式。传输流和少数容器格式允许接收者（分离器）检测“结束帧”。

我尝试使用MPEG-2 传输流，但它会增加一个帧的延迟。
[我没有尝试RTSP协议，因为它无法使用管道工作]。

使用Flash 视频（FLV）容器可以将延迟减少到一个帧。
FLV 容器的数据包头部有一个“有效载荷大小”字段，可以让分离器避免等待下一帧。

使用 FLV 容器和 H.264 编解码器的命令：

cmd = (
    "ffmpeg "
    "-f rawvideo -pix_fmt rgb24 -s 224x224 "
    "-i pipe: "
    "-vcodec libx264 "
    "-f flv "
    "-tune zerolatency "
    "pipe:"
)
encoder_process = subprocess.Popen(
    cmd.split(), stdin=subprocess.PIPE, stdout=subprocess.PIPE
)

cmd = (
    "ffmpeg "
    "-probesize 32 "
    "-flags low_delay "
    "-f flv "
    "-vcodec h264 "
    "-i pipe: "
    "-f rawvideo -pix_fmt rgb24 -s 224x224 "
    "pipe:"
)

decoder_process = subprocess.Popen(
    cmd.split(), stdin=subprocess.PIPE, stdout=subprocess.PIPE
)

在上面的命令中，FFmpeg使用FLV复用器进行编码过程，并使用FLV解复用器进行解码过程。

输出结果：

time=0 frames=1   encoder_write
time=0 frames=1   decoder_read  psnr=49.0
time=2 frames=2   encoder_write
time=2 frames=2   decoder_read  psnr=48.3
time=4 frames=3   encoder_write
time=4 frames=3   decoder_read  psnr=45.8
time=6 frames=4   encoder_write
time=6 frames=4   decoder_read  psnr=46.7

正如您所看到的，没有额外的帧延迟。

其他也可以使用的容器格式有：AVI 和 MKV。