使用外部模块调用时，多进程池速度变慢。

Question

使用外部模块调用时，多进程池速度变慢。

pythonperformanceaudiomultiprocessinglibrosa

3

我的脚本正在调用 librosa模块来计算短音频片段的梅尔倒谱系数（MFCCs）。在加载音频后，我想尽可能快地计算这些特征（以及其他一些音频特征）- 因此使用了多进程处理。

问题：多进程变体比顺序处理要慢得多。分析显示，我的代码超过90％的时间花费在<method 'acquire' of '_thread.lock' objects>上。如果是许多小任务，这并不奇怪，但在一个测试案例中，我将音频分成4个块并在单独的进程中处理它们。我认为开销应该很小，但实际上，它几乎和许多小任务一样糟糕。

据我理解， multiprocessing 模块应该几乎复制所有内容，并且不应该有任何锁定冲突。然而，结果似乎显示了不同的情况。难道是底层的 librosa 模块保持了某种内部锁定吗？
我的简介结果以纯文本形式呈现：https://drive.google.com/open?id=17DHfmwtVOJOZVnwIueeoWClUaWkvhTPc 作为图片：https://drive.google.com/open?id=1KuZyo0CurHd9GjXge5CYQhdWn2Q6OG8Z 复制“问题”的代码：
import time import numpy as np import librosa from functools import partial from multiprocessing import Pool n_proc = 4 y, sr = librosa.load(librosa.util.example_audio_file(), duration=60) # load audio sample y = np.repeat(y, 10) # repeat signal so that we can get more reliable measurements sample_len = int(sr * 0.2) # We will compute MFCC for short pieces of audio def get_mfcc_in_loop(audio, sr, sample_len): # We split long array into small ones of lenth sample_len y_windowed = np.array_split(audio, np.arange(sample_len, len(audio), sample_len)) for sample in y_windowed: mfcc = librosa.feature.mfcc(y=sample, sr=sr) start = time.time() get_mfcc_in_loop(y, sr, sample_len) print('Time single process:', time.time() - start) # Let's test now feeding these small arrays to pool of 4 workers. Since computing # MFCCs for these small arrays is fast, I'd expect this to be not that fast start = time.time() y_windowed = np.array_split(y, np.arange(sample_len, len(y), sample_len)) with Pool(n_proc) as pool: func = partial(librosa.feature.mfcc, sr=sr) result = pool.map(func, y_windowed) print('Time multiprocessing (many small tasks):', time.time() - start) # Here we split the audio into 4 chunks and process them separately. This I'd expect # to be fast and somehow it isn't. What could be the cause? Anything to do about it? start = time.time() y_split = np.array_split(y, n_proc) with Pool(n_proc) as pool: func = partial(get_mfcc_in_loop, sr=sr, sample_len=sample_len) result = pool.map(func, y_split) print('Time multiprocessing (a few large tasks):', time.time() - start)

在我的机器上的结果：

单进程时间：8.48秒

多进程（许多小任务）时间：44.20秒

多进程（几个大任务）时间：41.99秒

有什么想法是什么原因导致这种情况？更好的是，如何使它变得更好？

- Lukasz Tracewski

您的 Google Drive 链接到分析结果并非公开。我对这个问题和你的解决方案非常感兴趣，你能将分析输出公开让我们查看吗？ - someone

您的谷歌云盘链接到的分析结果不是公开的。我对这个问题和您的解决方案非常感兴趣，您能把您的分析结果公开让我们查看吗？ - undefined

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lukasz Tracewski · Accepted Answer

为了查明发生了什么事情，我运行了top -H，发现有超过60个线程被生成了！这就是问题所在。原来是librosa及其依赖项会生成许多额外的线程，这些线程共同破坏了并行性。

解决方案

过度订阅问题在joblib文档中有很好的描述。让我们使用它吧。

import time
import numpy as np
import librosa
from joblib import Parallel, delayed

n_proc = 4

y, sr = librosa.load(librosa.util.example_audio_file(), duration=60) # load audio sample
y = np.repeat(y, 10) # repeat signal so that we can get more reliable measurements
sample_len = int(sr * 0.2) # We will compute MFCC for short pieces of audio

def get_mfcc_in_loop(audio, sr, sample_len):
    # We split long array into small ones of lenth sample_len
    y_windowed = np.array_split(audio, np.arange(sample_len, len(audio), sample_len))
    for sample in y_windowed:
        mfcc = librosa.feature.mfcc(y=sample, sr=sr)

start = time.time()
y_windowed = np.array_split(y, np.arange(sample_len, len(y), sample_len))
Parallel(n_jobs=n_proc, backend='multiprocessing')(delayed(get_mfcc_in_loop)(audio=data, sr=sr, sample_len=sample_len) for data in y_windowed)
print('Time multiprocessing with joblib (many small tasks):', time.time() - start)


y_split = np.array_split(y, n_proc)
start = time.time()
Parallel(n_jobs=n_proc, backend='multiprocessing')(delayed(get_mfcc_in_loop)(audio=data, sr=sr, sample_len=sample_len) for data in y_split)
print('Time multiprocessing with joblib (a few large tasks):', time.time() - start)

结果:

使用joblib进行时间多进程处理(许多小任务): 2.66
使用joblib进行时间多进程处理(几个大任务): 2.65

比使用模块快15倍。