使用ffmpeg将视频帧导入到numpy数组中，而无需将整个电影加载到内存中

Question

使用ffmpeg将视频帧导入到numpy数组中，而无需将整个电影加载到内存中

7

我不确定我所问的是否可行或者具有功能性，但是我正在尝试以有序但是“按需”的方式从视频中加载帧。

基本上现在我是通过使用 stdout 管道将整个未压缩的视频读入缓冲区中，例如：

H, W = 1080, 1920 # video dimensions
video = '/path/to/video.mp4' # path to video

# ffmpeg command
command = [ "ffmpeg",
            '-i', video,
            '-pix_fmt', 'rgb24',
            '-f', 'rawvideo',
            'pipe:1' ]

# run ffmpeg and load all frames into numpy array (num_frames, H, W, 3)
pipe = subprocess.run(command, stdout=subprocess.PIPE, bufsize=10**8)
video = np.frombuffer(pipe.stdout, dtype=np.uint8).reshape(-1, H, W, 3)

# or alternatively load individual frames in a loop
nb_img = H*W*3 # H * W * 3 channels * 1-byte/channel
for i in range(0, len(pipe.stdout), nb_img):
    img = np.frombuffer(pipe.stdout, dtype=np.uint8, count=nb_img, offset=i).reshape(H, W, 3)

我想知道是否有可能在Python中完成同样的过程，但是不需要先将整个视频加载到内存中。我脑海中的想象是：

打开一个缓冲区
按需在内存位置上寻找
将帧加载到numpy数组中

我知道还有其他库，例如OpenCV，可以实现相同的行为，但我想知道：

使用此类ffmpeg管道到numpy数组操作是否有效率？
这是否会破坏直接使用ffmpeg而非通过OpenCV进行寻找/加载或首先提取帧然后加载单个文件的加速优势？

- marcman

1

我不是很清楚你试图解决的真正问题是什么。我推断它可能与ffmpeg寻找速度过慢有关？如果这是问题，目前寻找需要多长时间？您的视频长度是多少秒和fps？在加载其他视频之前，您通常会花费多长时间来处理它们？我试图理解您真正想要优化的内容以及如何通过更多的RAM、更多的磁盘、更好的数据结构来进行权衡以实现它。 - Mark Setchell

1

@MarkSetchell：目前我正在解析mp4文件，将单独的帧写入文件，然后在以后的各个进程中读取这些帧。这肯定不够高效，但这些帧在后续流程的不同阶段都是必需的，而我没有足够的内存来存储整个过程中的所有帧。因为ffmpeg在加载/解析mp4方面非常高效，所以我想跳过单个帧IO，只依赖于ffmpeg。然而，我上面的初始解决方案导致一个持续100秒的RGB视频，分辨率为1920x1440，需要24GB的RAM。 - marcman

1

现在有些困惑。192014403=8MB/帧。所以如果你有700帧，你应该只需要5GB的RAM，而不是24GB？ - Mark Setchell

1

@MarkSetchell 对不起，我在没有清楚说明的情况下给出了2个例子：那个24GB的示例是针对100秒、30Hz的视频（3000帧），而第二个示例是我实际进行基准测试的，大约有700帧，并产生了那些加载时间。 - marcman

1

好的，你考虑过以下三种方案吗？1）将帧以YUV格式存储在内存中，并对U和V通道进行色度子采样。这样可以获得完整分辨率的亮度Y和半分辨率的颜色（U和V），相对于RGB可以将RAM需求降低50%。或者2）将颜色减少到<256种颜色，这意味着您可以使用调色板图像，每个像素只需要1个字节而不是RGB的3个字节，从而将存储空间减少到RGB的1/3。或者3）使用Redis，这样您就可以利用网络中其他计算机的RAM。 - Mark Setchell

显示剩余3条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rotem · Accepted Answer

在不将整个电影加载到内存中的情况下，寻找和提取帧是可能的，而且相对简单。

当请求的帧不是关键帧时，会有一些速度损失。
当请求非关键帧时，FFmpeg会寻找最接近请求帧之前的关键帧，并解码从关键帧到请求帧的所有帧。

演示代码样例执行以下操作：

构建合成的每秒1帧视频，带有运行帧计数器-非常适用于测试。
作为子进程执行FFmpeg，并将stdout设置为输出PIPE。
代码示例查找第11秒，并将持续时间设置为5秒。
从PIPE读取（并显示）已解码的视频帧，直到没有更多帧可读取为止。

以下是代码示例:

import numpy as np
import cv2
import subprocess as sp
import shlex

# Build synthetic 1fps video (with a frame counter):
# Set GOP size to 20 frames (place key frame every 20 frames - for testing).
#########################################################################
W, H = 320, 240 # video dimensions
video_path = 'video.mp4'  # path to video
sp.run(shlex.split(f'ffmpeg -y -f lavfi -i testsrc=size={W}x{H}:rate=1 -vcodec libx264 -g 20 -crf 17 -pix_fmt yuv420p -t 60 {video_path}'))
#########################################################################


# ffmpeg command
command = [ 'ffmpeg',
            '-ss', '00:00:11',    # Seek to 11'th second.
            '-i', video_path,
            '-pix_fmt', 'bgr24',  # brg24 for matching OpenCV
            '-f', 'rawvideo',
            '-t', '5',            # Play 5 seconds long
            'pipe:' ]

# Execute FFmpeg as sub-process with stdout as a pipe
process = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

# Load individual frames in a loop
nb_img = H*W*3  # H * W * 3 channels * 1-byte/channel

# Read decoded video frames from the PIPE until no more frames to read
while True:
    # Read decoded video frame (in raw video format) from stdout process.
    buffer = process.stdout.read(W*H*3)

    # Break the loop if buffer length is not W*H*3 (when FFmpeg streaming ends).
    if len(buffer) != W*H*3:
        break

    img = np.frombuffer(buffer, np.uint8).reshape(H, W, 3)

    cv2.imshow('img', img)  # Show the image for testing
    cv2.waitKey(1000)

process.stdout.close()
process.wait()
cv2.destroyAllWindows()

注意：
当播放时间提前确定时，参数-t 5很重要。
如果播放时间不是提前确定的，则可以删除-t并在需要时中断循环。

时间测量：

一次性读取所有帧的测量。
循环逐帧读取的测量。

# 6000 frames:
sp.run(shlex.split(f'ffmpeg -y -f lavfi -i testsrc=size={W}x{H}:rate=1 -vcodec libx264 -g 20 -crf 17 -pix_fmt yuv420p -t 6000 {video_path}'))

# ffmpeg command
command = [ 'ffmpeg',
            '-ss', '00:00:11',    # Seek to 11'th second.
            '-i', video_path,
            '-pix_fmt', 'bgr24',  # brg24 for matching OpenCV
            '-f', 'rawvideo',
            '-t', '5000',         # Play 5000 seconds long (5000 frames).
            'pipe:' ]



# Load all frames into numpy array
################################################################################
t = time.time()

# run ffmpeg and load all frames into numpy array (num_frames, H, W, 3)
process = sp.run(command, stdout=sp.PIPE, bufsize=10**8)
video = np.frombuffer(process.stdout, dtype=np.uint8).reshape(-1, H, W, 3)

elapsed1 = time.time() - t
################################################################################


# Load load individual frames in a loop
################################################################################
t = time.time()

# Execute FFmpeg as sub-process with stdout as a pipe
process = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

# Read decoded video frames from the PIPE until no more frames to read
while True:
    # Read decoded video frame (in raw video format) from stdout process.
    buffer = process.stdout.read(W*H*3)

    # Break the loop if buffer length is not W*H*3 (when FFmpeg streaming ends).
    if len(buffer) != W*H*3:
        break

    img = np.frombuffer(buffer, np.uint8).reshape(H, W, 3)

elapsed2 = time.time() - t

process.wait()


################################################################################

print(f'Read all frames at once elapsed time: {elapsed1}')
print(f'Read frame by frame elapsed time: {elapsed2}')

结果:

一次性读取所有帧的耗时: 7.371837854385376

逐帧读取的总耗时: 10.089557886123657

结果表明，逐帧读取存在一定的开销。

这种开销相对较小。
有可能这种开销与Python有关，而不是FFmpeg本身的问题。