如何在Python中获取BPM和节奏音频特征

24

我参与了一个需要提取歌曲特征如每分钟拍数(BPM)、节奏等的项目。然而,我还没有找到一个适合的Python库能够准确地检测这些特征。

请问有没有什么建议呢?

(在Matlab中,我知道一个名为Mirtoolbox的项目,可以在处理本地mp3文件后提供BPM和节奏信息。)


什么是编码格式?我从未听说过Python音频库...不过,我远非全能和无所不知。去启动你的谷歌搜索引擎,输入“Python音频库”进行搜索吧。 - Eric Johnson
6个回答

18

这个答案来得有点晚,但还是留下记录。我找到了三个带有Python绑定的音频库,它们可以从音频中提取特征。由于它们其实是用C编写的,所以安装起来并不容易。你需要正确地编译Python绑定并将它们添加到路径中以进行导入。以下是这些库:


4
现在我建议使用Essentia(http://essentia.upf.edu/),这是我一段时间前做出贡献的一个很棒的库。 - wizbcn

7

http://echonest.github.io/remix/

Python绑定非常丰富,但安装Echo Nest可能会很麻烦,因为该团队似乎无法构建稳定的安装程序。然而,它不进行本地处理。相反,它计算音频指纹并将歌曲上传到Echo Nest服务器,以使用他们不公开的算法进行信息提取。

3
有没有本地处理项目能够仅基于本地的mp3/wav文件提取bpm特征? - MaiTiano
1
我大约一年前对这个问题进行了一些研究,Echo Nest是Python最简单的解决方案。我不确定现在是否有其他可用的库 - 如果你找到了,请在这里回答。 - Mikko Ohtamaa
1
我和你有相同的发现。没有可用的库可以提取音乐特征。 - MaiTiano
或者,是否还有其他类似Echonest的库。即使它只包括一些功能提取函数。 - MaiTiano
2
EchoNest不再发放API密钥了... https://developer.echonest.com/account/register - alexvicegrab
他们的注册页面甚至无法加载了。 - dionyziz

2

我找到了@scaperot在这里的代码,它可以帮助你:

import wave, array, math, time, argparse, sys
import numpy, pywt
from scipy import signal
import pdb
import matplotlib.pyplot as plt

def read_wav(filename):

    #open file, get metadata for audio
    try:
        wf = wave.open(filename,'rb')
    except IOError, e:
        print e
        return

    # typ = choose_type( wf.getsampwidth() ) #TODO: implement choose_type
    nsamps = wf.getnframes();
    assert(nsamps > 0);

    fs = wf.getframerate()
    assert(fs > 0)

    # read entire file and make into an array
    samps = list(array.array('i',wf.readframes(nsamps)))
    #print 'Read', nsamps,'samples from', filename
    try:
        assert(nsamps == len(samps))
    except AssertionError, e:
        print  nsamps, "not equal to", len(samps)

    return samps, fs

# print an error when no data can be found
def no_audio_data():
    print "No audio data for sample, skipping..."
    return None, None

# simple peak detection
def peak_detect(data):
    max_val = numpy.amax(abs(data)) 
    peak_ndx = numpy.where(data==max_val)
    if len(peak_ndx[0]) == 0: #if nothing found then the max must be negative
        peak_ndx = numpy.where(data==-max_val)
    return peak_ndx

def bpm_detector(data,fs):
    cA = [] 
    cD = []
    correl = []
    cD_sum = []
    levels = 4
    max_decimation = 2**(levels-1);
    min_ndx = 60./ 220 * (fs/max_decimation)
    max_ndx = 60./ 40 * (fs/max_decimation)

    for loop in range(0,levels):
        cD = []
        # 1) DWT
        if loop == 0:
            [cA,cD] = pywt.dwt(data,'db4');
            cD_minlen = len(cD)/max_decimation+1;
            cD_sum = numpy.zeros(cD_minlen);
        else:
            [cA,cD] = pywt.dwt(cA,'db4');
        # 2) Filter
        cD = signal.lfilter([0.01],[1 -0.99],cD);

        # 4) Subtractargs.filename out the mean.

        # 5) Decimate for reconstruction later.
        cD = abs(cD[::(2**(levels-loop-1))]);
        cD = cD - numpy.mean(cD);
        # 6) Recombine the signal before ACF
        #    essentially, each level I concatenate 
        #    the detail coefs (i.e. the HPF values)
        #    to the beginning of the array
        cD_sum = cD[0:cD_minlen] + cD_sum;

    if [b for b in cA if b != 0.0] == []:
        return no_audio_data()
    # adding in the approximate data as well...    
    cA = signal.lfilter([0.01],[1 -0.99],cA);
    cA = abs(cA);
    cA = cA - numpy.mean(cA);
    cD_sum = cA[0:cD_minlen] + cD_sum;

    # ACF
    correl = numpy.correlate(cD_sum,cD_sum,'full') 

    midpoint = len(correl) / 2
    correl_midpoint_tmp = correl[midpoint:]
    peak_ndx = peak_detect(correl_midpoint_tmp[min_ndx:max_ndx]);
    if len(peak_ndx) > 1:
        return no_audio_data()

    peak_ndx_adjusted = peak_ndx[0]+min_ndx;
    bpm = 60./ peak_ndx_adjusted * (fs/max_decimation)
    print bpm
    return bpm,correl


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Process .wav file to determine the Beats Per Minute.')
    parser.add_argument('--filename', required=True,
                   help='.wav file for processing')
    parser.add_argument('--window', type=float, default=3,
                   help='size of the the window (seconds) that will be scanned to determine the bpm.  Typically less than 10 seconds. [3]')

    args = parser.parse_args()
    samps,fs = read_wav(args.filename)

    data = []
    correl=[]
    bpm = 0
    n=0;
    nsamps = len(samps)
    window_samps = int(args.window*fs)         
    samps_ndx = 0;  #first sample in window_ndx 
    max_window_ndx = nsamps / window_samps;
    bpms = numpy.zeros(max_window_ndx)

    #iterate through all windows
    for window_ndx in xrange(0,max_window_ndx):

        #get a new set of samples
        #print n,":",len(bpms),":",max_window_ndx,":",fs,":",nsamps,":",samps_ndx
        data = samps[samps_ndx:samps_ndx+window_samps]
        if not ((len(data) % window_samps) == 0):
            raise AssertionError( str(len(data) ) ) 

        bpm, correl_temp = bpm_detector(data,fs)
        if bpm == None:
            continue
        bpms[window_ndx] = bpm
        correl = correl_temp

        #iterate at the end of the loop
        samps_ndx = samps_ndx+window_samps;
        n=n+1; #counter for debug...

    bpm = numpy.median(bpms)
    print 'Completed.  Estimated Beats Per Minute:', bpm

    n = range(0,len(correl))
    plt.plot(n,abs(correl)); 
    plt.show(False); #plot non-blocking
    time.sleep(10);
plt.close();

这很酷,但我很好奇它的工作效果如何,因为简单的bpm检测器确实很简单。你有没有尝试过像这个这样的替代方案? - DrDeadKnee

1
"librosa"是您要寻找的软件包,它包含广泛的音频分析功能。函数"librosa.beat.beat_track()"和"librosa.beat.tempo()"将为您提取所需的特征。使用"librosa"中可用的函数还可以获取色度、MFCC、零交叉率等谱特征以及节奏特征,如tempogram。"

1
Librosa有librosa.beat.beat_track()方法,但需要提供BMP估计值作为“start_bpm”参数。不确定其准确性,但或许值得一试。

-1

最近我发现了Vampy,这是一个包装插件,可以让你在任何Vamp主机中使用用Python编写的Vamp插件。Vamp是一种音频处理插件系统,用于从音频数据中提取描述性信息的插件。希望这能帮到你。


Vamp网站对于如何安装Vampy并不是很清楚,他们建议使用SonicAnnotator等工具,但该网站似乎已经关闭了...http://www.omras2.org/SonicAnnotator如果Vampy是一个Python包,并且可以通过pip/conda轻松安装或通过git克隆,并且有一种简单的方法将其用作命令行工具,那将会更加有用。 - alexvicegrab

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接