为什么MFCC提取库返回不同的值？

Question

为什么MFCC提取库返回不同的值？

pythonvoice-recognitionvoicespeechmfcc

6

我正在使用两个不同的库提取MFCC特征：

python_speech_features库
BOB库

然而，这两个库的输出结果不同，甚至形状也不相同。这正常吗？还是我漏掉了某个参数？

以下是我代码中相关的部分：

import bob.ap
import numpy as np
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc, delta, logfbank

def bob_extract_features(audio, rate):
    #get MFCC
    rate              = 8000  # rate
    win_length_ms     = 30    # The window length of the cepstral analysis in milliseconds
    win_shift_ms      = 10    # The window shift of the cepstral analysis in milliseconds
    n_filters         = 26    # The number of filter bands
    n_ceps            = 13    # The number of cepstral coefficients
    f_min             = 0.    # The minimal frequency of the filter bank
    f_max             = 4000. # The maximal frequency of the filter bank
    delta_win         = 2     # The integer delta value used for computing the first and second order derivatives
    pre_emphasis_coef = 0.97  # The coefficient used for the pre-emphasis
    dct_norm          = True  # A factor by which the cepstral coefficients are multiplied
    mel_scale         = True  # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale

    c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min,
                    f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
    c.with_delta       = False
    c.with_delta_delta = False
    c.with_energy      = False

    signal = np.cast['float'](audio)           # vector should be in **float**
    example_mfcc = c(signal)                   # mfcc + mfcc' + mfcc''
    return  example_mfcc


def psf_extract_features(audio, rate):
    signal = np.cast['float'](audio) #vector should be in **float**
    mfcc_feature = mfcc(signal, rate, winlen = 0.03, winstep = 0.01, numcep = 13,
                        nfilt = 26, nfft = 512,appendEnergy = False)

    #mfcc_feature = preprocessing.scale(mfcc_feature)
    deltas       = delta(mfcc_feature, 2)
    fbank_feat   = logfbank(audio, rate)
    combined     = np.hstack((mfcc_feature, deltas))
    return mfcc_feature



track = 'test-sample.wav'
rate, audio = read(track)

features1 = psf_extract_features(audio, rate)
features2 = bob_extract_features(audio, rate)

print("--------------------------------------------")
t = (features1 == features2)
print(t)

- SuperKogito

2个回答

2

您是否尝试过使用容差比较两者？我相信这两个MFCC都是浮点数数组，因此测试精确相等可能不明智。尝试使用numpy.testing.assert_allclose并设置一些容差，然后决定容差是否足够好。

但是，我错过了您说的形状不匹配的内容，而且我没有经验可以在bob.ap上自信地发表评论。但是，通常情况下，一些库会将输入数组的开头或结尾填充零，以实现窗口化，如果其中一个库这样做方式不同，那么这可能是原因。

- motjuste

2

不是答案的一部分，但如果你正在寻找用于MFCC的库，librosa也可能是一个选项。 - motjuste

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Nikolay Shmyrev · Accepted Answer

然而两者的输出不同，甚至形状也不一样。这正常吗？

是的，有不同类型的算法，每个实现都选择自己的风格。

还是我缺少了什么参数？

这不仅仅涉及参数，还涉及算法的差异，例如窗口形状（hamming vs hanning），mel滤波器的形状，mel滤波器的起点，mel滤波器的归一化，liftering，dct flavor等等。

如果您想要相同的结果，请使用单一库进行提取，将它们同步起来是非常困难的。