使用Pyaudio从实时麦克风检测敲击声

35

我该如何使用pyaudio从实时麦克风中检测突然的敲击声?

2个回答

108

我曾经用过一种方法:

  • 每次读取一块样本,例如0.05秒的时间
  • 计算该块音频的RMS振幅(各个样本平方的平均值的平方根)
  • 如果该块的RMS振幅大于某个阈值,则为“嘈杂块”,否则为“安静块”
  • 一个突然的敲击声由一个安静块、少数几个嘈杂块和一个安静块组成
  • 如果你从未得到一个安静块,则你的阈值太低
  • 如果你从未得到一个嘈杂块,则你的阈值太高

我的应用程序是在无人值守的情况下记录“有趣”的噪音,因此只要存在嘈杂块,它就会继续记录。如果存在15秒钟的嘈杂期(“捂住耳朵”),则它将把阈值乘以1.1;如果存在15分钟的安静期(“听得更仔细”),则它将把阈值乘以0.9。但你的应用程序可能有不同的需求。

另外,我刚刚注意到我的代码中有些关于观察到的RMS值的注释。在Macbook Pro上,内置麦克风的范围为+/- 1.0标准化音频数据范围,并将输入音量设置为最大值时,以下是一些数据点:

  • 0.003-0.006(-50dB至-44dB)我房子里一个非常吵闹的中央供暖风扇
  • 0.010-0.40(-40dB至-8dB)在同一台笔记本电脑上打字
  • 0.10(-20dB)轻轻地在1英尺距离处弹指
  • 0.60(-4.4dB)在1英尺处响亮地弹指

更新:这里有一个样例可以让你开始实现。

#!/usr/bin/python

# open a microphone in pyAudio and listen for taps

import pyaudio
import struct
import math

INITIAL_TAP_THRESHOLD = 0.010
FORMAT = pyaudio.paInt16 
SHORT_NORMALIZE = (1.0/32768.0)
CHANNELS = 2
RATE = 44100  
INPUT_BLOCK_TIME = 0.05
INPUT_FRAMES_PER_BLOCK = int(RATE*INPUT_BLOCK_TIME)
# if we get this many noisy blocks in a row, increase the threshold
OVERSENSITIVE = 15.0/INPUT_BLOCK_TIME                    
# if we get this many quiet blocks in a row, decrease the threshold
UNDERSENSITIVE = 120.0/INPUT_BLOCK_TIME 
# if the noise was longer than this many blocks, it's not a 'tap'
MAX_TAP_BLOCKS = 0.15/INPUT_BLOCK_TIME

def get_rms( block ):
    # RMS amplitude is defined as the square root of the 
    # mean over time of the square of the amplitude.
    # so we need to convert this string of bytes into 
    # a string of 16-bit samples...

    # we will get one short out for each 
    # two chars in the string.
    count = len(block)/2
    format = "%dh"%(count)
    shorts = struct.unpack( format, block )

    # iterate over the block.
    sum_squares = 0.0
    for sample in shorts:
        # sample is a signed short in +/- 32768. 
        # normalize it to 1.0
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n

    return math.sqrt( sum_squares / count )

class TapTester(object):
    def __init__(self):
        self.pa = pyaudio.PyAudio()
        self.stream = self.open_mic_stream()
        self.tap_threshold = INITIAL_TAP_THRESHOLD
        self.noisycount = MAX_TAP_BLOCKS+1 
        self.quietcount = 0 
        self.errorcount = 0

    def stop(self):
        self.stream.close()

    def find_input_device(self):
        device_index = None            
        for i in range( self.pa.get_device_count() ):     
            devinfo = self.pa.get_device_info_by_index(i)   
            print( "Device %d: %s"%(i,devinfo["name"]) )

            for keyword in ["mic","input"]:
                if keyword in devinfo["name"].lower():
                    print( "Found an input: device %d - %s"%(i,devinfo["name"]) )
                    device_index = i
                    return device_index

        if device_index == None:
            print( "No preferred input found; using default input device." )

        return device_index

    def open_mic_stream( self ):
        device_index = self.find_input_device()

        stream = self.pa.open(   format = FORMAT,
                                 channels = CHANNELS,
                                 rate = RATE,
                                 input = True,
                                 input_device_index = device_index,
                                 frames_per_buffer = INPUT_FRAMES_PER_BLOCK)

        return stream

    def tapDetected(self):
        print("Tap!")

    def listen(self):
        try:
            block = self.stream.read(INPUT_FRAMES_PER_BLOCK)
        except IOError as e:
            # dammit. 
            self.errorcount += 1
            print( "(%d) Error recording: %s"%(self.errorcount,e) )
            self.noisycount = 1
            return

        amplitude = get_rms( block )
        if amplitude > self.tap_threshold:
            # noisy block
            self.quietcount = 0
            self.noisycount += 1
            if self.noisycount > OVERSENSITIVE:
                # turn down the sensitivity
                self.tap_threshold *= 1.1
        else:            
            # quiet block.

            if 1 <= self.noisycount <= MAX_TAP_BLOCKS:
                self.tapDetected()
            self.noisycount = 0
            self.quietcount += 1
            if self.quietcount > UNDERSENSITIVE:
                # turn up the sensitivity
                self.tap_threshold *= 0.9

if __name__ == "__main__":
    tt = TapTester()

    for i in range(1000):
        tt.listen()

你能发一个简单的代码示例吗?我以前从未处理过音频。 - a sandwhich
非常感谢!这对我帮助很大,非常有启发性。不过,是否可能将整个自动阈值概念排除在外,手动校准它呢?例如,如果我在麦克风中记录敲击声、噪音、啪啪声和拍手声,并在软件中查看它们,那么声音明显具有高达-12 dB的级别,而敲击声则比-12 dB大得多,或者更像0 dB甚至更高。因此,我想将我的阈值设置为-12 dB。我该怎么做呢? - user576922
@Dhruv - 只需删除更改self.tap_threshold的逻辑。根据您的“-12dB”相对于什么,它可能与0.25的阈值相对应,也可能不相对应,因此请尝试将tap_threshold初始化为该值,而不是我的示例中的0.01。 - Russell Borogove
3
Python自带一种计算RMS振幅的标准方法,不管你信不信:audioop。你可以用以下代码替换上面的get_rms函数:def get_rms(block): return audioop.rms(block, 2) - John Wiseman
哇,我不知道有关于audioop的事情。那是一些严肃的import antigravity操作。(为了完全兼容,你仍然需要重新调整它,但是没错。) - Russell Borogove
这是我所能找到的唯一一个展示如何从pyaudio中提取样本值的例子。干杯! - JeffThompson

18

以下是上述代码的简化版本...

import pyaudio
import struct
import math

INITIAL_TAP_THRESHOLD = 0.010
FORMAT = pyaudio.paInt16 
SHORT_NORMALIZE = (1.0/32768.0)
CHANNELS = 2
RATE = 44100  
INPUT_BLOCK_TIME = 0.05
INPUT_FRAMES_PER_BLOCK = int(RATE*INPUT_BLOCK_TIME)

OVERSENSITIVE = 15.0/INPUT_BLOCK_TIME                    

UNDERSENSITIVE = 120.0/INPUT_BLOCK_TIME # if we get this many quiet blocks in a row, decrease the threshold

MAX_TAP_BLOCKS = 0.15/INPUT_BLOCK_TIME # if the noise was longer than this many blocks, it's not a 'tap'

def get_rms(block):

    # RMS amplitude is defined as the square root of the 
    # mean over time of the square of the amplitude.
    # so we need to convert this string of bytes into 
    # a string of 16-bit samples...

    # we will get one short out for each 
    # two chars in the string.
    count = len(block)/2
    format = "%dh"%(count)
    shorts = struct.unpack( format, block )

    # iterate over the block.
    sum_squares = 0.0
    for sample in shorts:
    # sample is a signed short in +/- 32768. 
    # normalize it to 1.0
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n

    return math.sqrt( sum_squares / count )

pa = pyaudio.PyAudio()                                 #]
                                                       #|
stream = pa.open(format = FORMAT,                      #|
         channels = CHANNELS,                          #|---- You always use this in pyaudio...
         rate = RATE,                                  #|
         input = True,                                 #|
         frames_per_buffer = INPUT_FRAMES_PER_BLOCK)   #]

tap_threshold = INITIAL_TAP_THRESHOLD                  #]
noisycount = MAX_TAP_BLOCKS+1                          #|---- Variables for noise detector...
quietcount = 0                                         #|
errorcount = 0                                         #]         

for i in range(1000):
    try:                                                    #]
        block = stream.read(INPUT_FRAMES_PER_BLOCK)         #|
    except IOError, e:                                      #|---- just in case there is an error!
        errorcount += 1                                     #|
        print( "(%d) Error recording: %s"%(errorcount,e) )  #|
        noisycount = 1                                      #]

    amplitude = get_rms(block)
    if amplitude > tap_threshold: # if its to loud...
        quietcount = 0
        noisycount += 1
        if noisycount > OVERSENSITIVE:
            tap_threshold *= 1.1 # turn down the sensitivity

    else: # if its to quiet...

        if 1 <= noisycount <= MAX_TAP_BLOCKS:
            print 'tap!'
        noisycount = 0
        quietcount += 1
        if quietcount > UNDERSENSITIVE:
            tap_threshold *= 0.9 # turn up the sensitivity

pyaudio.PyAudio().open(... ) 中如果没有输入设备索引,你会得到静音还是 pyaudio 会自动定位一个可用的麦克风? - Mr Purple

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接