Python记录检测到声音时的音频

7
我希望有一个Python脚本在后台运行,并使用pyaudio在麦克风达到一定阈值时记录声音文件。这是用于双向无线电网络的监视器,因此我们只想记录传输的音频。
计划中的任务:
- 在n%门限下记录音频输入 - 在静默几秒钟后停止录制 - 继续录制几秒钟后的音频 - 第二阶段:将输入数据输入MySQL数据库以搜索录音
我考虑了类似的文件结构
/home/Recodings/2013/8/23/12-33.wav 将是23/08/2013 @ 12:33.wav的传输记录。
我已经使用了 Detect and record a sound with python的代码。
现在我有点不知所措,非常感谢您提供的指导。

你还在寻找吗? - sliders_alpha
6个回答

24
当前最高赞答案有点过时,只适用于Python 2。这里是一个更新后适用于Python 3的版本。它将函数封装到类中,并将所有内容打包成一个简单易用的版本。请注意,最高评分答案和我的脚本之间有一个关键区别:
顶部的脚本仅记录一个文件然后停止,而我的脚本在检测到噪音时保持记录并将录音倾倒到目录中。
两个脚本的主要思路非常相似:
步骤1:“侦听”,直到rms大于阈值
步骤2:开始录制,设置定时器以停止录制,== TIMEOUT_LENGTH
步骤3:如果rms在计时器超时之前再次突破阈值,请重置计时器
步骤4:现在计时器已过期,请将录音写入目录并返回步骤1
import pyaudio
import math
import struct
import wave
import time
import os

Threshold = 10

SHORT_NORMALIZE = (1.0/32768.0)
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
swidth = 2

TIMEOUT_LENGTH = 5

f_name_directory = r'C:\Users\Jason\PyCharmProjects\AutoRecorder\records'

class Recorder:

    @staticmethod
    def rms(frame):
        count = len(frame) / swidth
        format = "%dh" % (count)
        shorts = struct.unpack(format, frame)

        sum_squares = 0.0
        for sample in shorts:
            n = sample * SHORT_NORMALIZE
            sum_squares += n * n
        rms = math.pow(sum_squares / count, 0.5)

        return rms * 1000

    def __init__(self):
        self.p = pyaudio.PyAudio()
        self.stream = self.p.open(format=FORMAT,
                                  channels=CHANNELS,
                                  rate=RATE,
                                  input=True,
                                  output=True,
                                  frames_per_buffer=chunk)

    def record(self):
        print('Noise detected, recording beginning')
        rec = []
        current = time.time()
        end = time.time() + TIMEOUT_LENGTH

        while current <= end:

            data = self.stream.read(chunk)
            if self.rms(data) >= Threshold: end = time.time() + TIMEOUT_LENGTH

            current = time.time()
            rec.append(data)
        self.write(b''.join(rec))

    def write(self, recording):
        n_files = len(os.listdir(f_name_directory))

        filename = os.path.join(f_name_directory, '{}.wav'.format(n_files))

        wf = wave.open(filename, 'wb')
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(self.p.get_sample_size(FORMAT))
        wf.setframerate(RATE)
        wf.writeframes(recording)
        wf.close()
        print('Written to file: {}'.format(filename))
        print('Returning to listening')



    def listen(self):
        print('Listening beginning')
        while True:
            input = self.stream.read(chunk)
            rms_val = self.rms(input)
            if rms_val > Threshold:
                self.record()

a = Recorder()

a.listen()

@primusa 我怎样才能在不受超时暂停的情况下保存脚本捕捉到的声音? - Juliette
你是个明星!这会帮助我建立起针对我的邻居吵闹的狗的案件! - Monty

12

一段时间前我写了一些步骤

  • 在 n% 门限上记录音频输入

答案:开始一个布尔变量类型的“沉默”,并计算RMS来决定 Silence 是真还是假,设置一个 RMS 阈值。

  • 在静默几秒后停止录制

答案:您需要计算超时时间,为此请获取帧率、块大小以及所需秒数,然后计算超时时间(FrameRate / chunk * Max_Seconds)。

  • 在录音后继续录制若干秒

答案:如果 Silence 是 false == (RMS > Threshold),则获取音频数据的最后一个块(LastBlock)并继续记录 :-)。

  • 第二阶段:将输入数据输入到 MySQL 数据库中以搜索记录

答案:这一步取决于您。

源代码:

import pyaudio
import math
import struct
import wave

#Assuming Energy threshold upper than 30 dB
Threshold = 30

SHORT_NORMALIZE = (1.0/32768.0)
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
swidth = 2
Max_Seconds = 10
TimeoutSignal=((RATE / chunk * Max_Seconds) + 2)
silence = True
FileNameTmp = '/home/Recodings/2013/8/23/12-33.wav'
Time=0
all =[]

def GetStream(chunk):
    return stream.read(chunk)
def rms(frame):
    count = len(frame)/swidth
    format = "%dh"%(count)
    # short is 16 bit int
    shorts = struct.unpack( format, frame )

    sum_squares = 0.0
    for sample in shorts:
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n
    # compute the rms 
    rms = math.pow(sum_squares/count,0.5);
    return rms * 1000

def WriteSpeech(WriteData):
    stream.stop_stream()
    stream.close()
    p.terminate()
    wf = wave.open(FileNameTmp, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(WriteData)
    wf.close()

def KeepRecord(TimeoutSignal, LastBlock):
    all.append(LastBlock)
    for i in range(0, TimeoutSignal):
        try:
            data = GetStream(chunk)
        except:
            continue
        #I chage here (new Ident)
        all.append(data)

    print "end record after timeout";
    data = ''.join(all)
    print "write to File";
    WriteSpeech(data)
    silence = True
    Time=0
    listen(silence,Time)     

def listen(silence,Time):
    print "waiting for Speech"
    while silence:
        try:
            input = GetStream(chunk)
        except:
            continue
        rms_value = rms(input)
        if (rms_value > Threshold):
            silence=False
            LastBlock=input
            print "hello ederwander I'm Recording...."
            KeepRecord(TimeoutSignal, LastBlock)
        Time = Time + 1
        if (Time > TimeoutSignal):
            print "Time Out No Speech Detected"
            sys.exit()

p = pyaudio.PyAudio()

stream = p.open(format = FORMAT,
    channels = CHANNELS,
    rate = RATE,
    input = True,
    output = True,
    frames_per_buffer = chunk)

listen(silence,Time)

好的,我已经有所进展,但我需要程序无限运行并自己创建文件。目前在ederwander的代码中出现了文件未找到的问题。程序可以正确地检测到声音......我们是否必须为程序设置超时,或者只需为录制设置超时,然后再次开始监听即可?抱歉,我有点纠结了。 - ZeroG
好的,现在我没有收到任何错误信息,并且文件输出正常工作。问题是我只能得到1秒钟的触发音频录制,我需要它录制直到再次静音,然后写入文件,然后重新开始监听。 - ZeroG
制作一个名为“listen”的函数,在KeepRecord函数中调用即可,非常容易! - ederwander
抱歉,我对这一切都很新,并且正在努力学习。函数“listen”在代码中的哪里?因此,我了解如何创建一个函数,但我真的很难确定将其放在哪里,因为我认为从技术上讲整个代码就是“listen”函数吗? - ZeroG
新的更新。附注:我还没有测试。 - ederwander

0

我希望在录音的两侧都有一个缓冲区,这样录音就不会突然开始或停止。这使我能够摒弃“监听”方法,所以它只是一直在录音。

import pyaudio
import math
import struct
import wave
import time
import datetime
import os

TRIGGER_RMS = 5
#RATE = 44100 # = 300MB/hour
RATE = 22050 # = 150MB/hour
TIMEOUT_SECS = 5
FRAME_SECS = 0.25 # length of frame in secs
CUSHION_SECS = 1 # amount of recording before and after sound

SHORT_NORMALIZE = (1.0/32768.0)
FORMAT = pyaudio.paInt16
CHANNELS = 1
SHORT_WIDTH = 2
CHUNK = int(RATE * FRAME_SECS)
CUSHION_FRAMES = int(CUSHION_SECS / FRAME_SECS)
TIMEOUT_FRAMES = int(TIMEOUT_SECS / FRAME_SECS)

f_name_directory = '.'

class Recorder:
    @staticmethod
    def rms(frame):
        count = len(frame) / SHORT_WIDTH
        format = "%dh" % (count)
        shorts = struct.unpack(format, frame)

        sum_squares = 0.0
        for sample in shorts:
            n = sample * SHORT_NORMALIZE
            sum_squares += n * n
        rms = math.pow(sum_squares / count, 0.5)

        return rms * 1000

    def __init__(self):
        self.p = pyaudio.PyAudio()
        self.stream = self.p.open(format=FORMAT,
                        channels=CHANNELS,
                        rate=RATE,
                        input=True,
                        output=True,
                        frames_per_buffer=CHUNK)
        self.time = time.time()
        self.quiet = []
        self.quiet_idx = -1
        self.timeout = 0

    def record(self):
        sound = []
        start = time.time()
        begin_time = None

        while True:
            data = self.stream.read(CHUNK)
            rms_val = self.rms(data)

            if self.inSound(data):
                sound.append(data)
                if begin_time == None:
                    begin_time = datetime.datetime.now()
            else:
                self.queueQuiet(data)
                if len(sound) > 0:
                    self.write(sound, begin_time)
                    sound.clear()
                    begin_time = None

            curr = time.time()
            secs = int(curr - start)
            tout = 0 if self.timeout == 0 else int(self.timeout - curr)
            label = 'listening' if self.timeout == 0 else 'recording'
            print('%s: level=%4.2f secs=%d timeout=%d            ' % (label, rms_val, secs, tout), end='\r')
        
    # quiet is a circular buffer of size cushion 
    def queueQuiet(self, data):
        self.quiet_idx += 1
        if self.quiet_idx == CUSHION_FRAMES:
            self.quiet_idx = 0
        
        if len(self.quiet) < CUSHION_FRAMES:
            self.quiet.append(data)
        else:            
            self.quiet[self.quiet_idx] = data

    def dequeueQuiet(self, sound):
        if len(self.quiet) == 0:
            return sound
        
        ret = []
        
        # either quiet not full or full and in order
        if len(self.quiet) < CUSHION_FRAMES or self.quiet_idx == 0:
            ret.extend(self.quiet)
            ret.extend(sound)

        else:
            ret.extend(self.quiet[self.quiet_idx:])
            ret.extend(self.quiet[0:self.quiet_idx])
            ret.extend(sound)

        return ret
    
    def inSound(self, data):
        rms = self.rms(data)
        curr = time.time()

        if rms >= TRIGGER_RMS:
            self.timeout = curr + TIMEOUT_SECS
            return True
        
        if curr < self.timeout:
            return True

        self.timeout = 0
        return False

    def write(self, sound, begin_time):
        # insert the pre-sound quiet frames into sound
        sound = self.dequeueQuiet(sound)

        # sound ends with TIMEOUT_FRAMES of quiet
        # remove all but CUSHION_FRAMES
        keep_frames = len(sound) - TIMEOUT_FRAMES + CUSHION_FRAMES
        recording = b''.join(sound[0:keep_frames])

        filename = begin_time.strftime('%Y-%m-%d_%H.%M.%S')
        pathname = os.path.join(f_name_directory, '{}.wav'.format(filename))

        wf = wave.open(pathname, 'wb')
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(self.p.get_sample_size(FORMAT))
        wf.setframerate(RATE)
        wf.writeframes(recording)
        wf.close()
        print('')
        print('writing: {}'.format(pathname))
        print('')

0

所以你只需要 getLevel(data) 函数吗? 一个快速的解决方法是:

def getLevel(data):
   sqrsum = 0
   for b in data:
      b = ord(b)
      sqrsum+=b*b
   return sqrsum

这应该随着音量的增加而增加。通过试错设置适当的阈值。


谢谢ejk314,你认为我应该把它放在代码的哪里?另外,getLevel是一个pyaudio函数吗? - ZeroG

0

对于那些由于缺少portaudio.h而无法安装pyaudio的人,您可以执行以下操作:

sudo apt-get install portaudio19-dev python-pyaudio python3-pyaudio

答案来自:portaudio.h: No such file or directory


0
我修复了由Mike Schultz编写的代码above。我还试图根据麦克风噪音自动设置rms阈值的值,但是失败得很惨。因此,您必须手动将阈值设置为您的麦克风噪音水平。
import pyaudio
import math
import struct
import wave
import time
import datetime
import os

TRIGGER_RMS = 10 # start recording above 10
RATE = 16000 # sample rate
TIMEOUT_SECS = 1 # silence time after which recording stops
FRAME_SECS = 0.25 # length of frame(chunks) to be processed at once in secs
CUSHION_SECS = 1 # amount of recording before and after sound

SHORT_NORMALIZE = (1.0/32768.0)
FORMAT = pyaudio.paInt16
CHANNELS = 1
SHORT_WIDTH = 2
CHUNK = int(RATE * FRAME_SECS)
CUSHION_FRAMES = int(CUSHION_SECS / FRAME_SECS)
TIMEOUT_FRAMES = int(TIMEOUT_SECS / FRAME_SECS)

f_name_directory = './'

class Recorder:
    @staticmethod
    def rms(frame):
        count = len(frame) / SHORT_WIDTH
        format = "%dh" % (count)
        shorts = struct.unpack(format, frame)

        sum_squares = 0.0
        for sample in shorts:
            n = sample * SHORT_NORMALIZE
            sum_squares += n * n
        rms = math.pow(sum_squares / count, 0.5)

        return rms * 1000

    def __init__(self):
        self.p = pyaudio.PyAudio()
        self.stream = self.p.open(format=FORMAT,
                        channels=CHANNELS,
                        rate=RATE,
                        input=True,
                        output=True,
                        frames_per_buffer=CHUNK)
        self.time = time.time()
        self.quiet = []
        self.quiet_idx = -1
        self.timeout = 0

    def record(self):
        print('')
        sound = []
        start = time.time()
        begin_time = None
        while True:
            data = self.stream.read(CHUNK)
            rms_val = self.rms(data)
            if self.inSound(data):
                sound.append(data)
                if begin_time == None:
                    begin_time = datetime.datetime.now()
            else:
                if len(sound) > 0:
                    self.write(sound, begin_time)
                    sound.clear()
                    begin_time = None
                else:
                    self.queueQuiet(data)

            curr = time.time()
            secs = int(curr - start)
            tout = 0 if self.timeout == 0 else int(self.timeout - curr)
            label = 'Listening' if self.timeout == 0 else 'Recording'
            print('[+] %s: Level=[%4.2f] Secs=[%d] Timeout=[%d]' % (label, rms_val, secs, tout), end='\r')
        
    # quiet is a circular buffer of size cushion
    def queueQuiet(self, data):
        self.quiet_idx += 1
        # start over again on overflow
        if self.quiet_idx == CUSHION_FRAMES:
            self.quiet_idx = 0
        
        # fill up the queue
        if len(self.quiet) < CUSHION_FRAMES:
            self.quiet.append(data)
        # replace the element on the index in a cicular loop like this 0 -> 1 -> 2 -> 3 -> 0 and so on...
        else:            
            self.quiet[self.quiet_idx] = data

    def dequeueQuiet(self, sound):
        if len(self.quiet) == 0:
            return sound
        
        ret = []
        
        if len(self.quiet) < CUSHION_FRAMES:
            ret.append(self.quiet)
            ret.extend(sound)
        else:
            ret.extend(self.quiet[self.quiet_idx + 1:])
            ret.extend(self.quiet[:self.quiet_idx + 1])
            ret.extend(sound)

        return ret
    
    def inSound(self, data):
        rms = self.rms(data)
        curr = time.time()

        if rms > TRIGGER_RMS:
            self.timeout = curr + TIMEOUT_SECS
            return True
        
        if curr < self.timeout:
            return True

        self.timeout = 0
        return False

    def write(self, sound, begin_time):
        # insert the pre-sound quiet frames into sound
        sound = self.dequeueQuiet(sound)

        # sound ends with TIMEOUT_FRAMES of quiet
        # remove all but CUSHION_FRAMES
        keep_frames = len(sound) - TIMEOUT_FRAMES + CUSHION_FRAMES
        recording = b''.join(sound[0:keep_frames])

        filename = begin_time.strftime('%Y-%m-%d_%H.%M.%S')
        pathname = os.path.join(f_name_directory, '{}.wav'.format(filename))

        wf = wave.open(pathname, 'wb')
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(self.p.get_sample_size(FORMAT))
        wf.setframerate(RATE)
        wf.writeframes(recording)
        wf.close()
        print('[+] Saved: {}'.format(pathname))

a = Recorder()

a.record()

除此之外,如果有人想要检测人类的语音而不是一般的声音,你可以查找一些叫做声活动检测器(VAD)的东西,比如this,它们提供适用于多个平台的SDK,非常适合应用开发。还有一种叫做webrtc的东西,但它相对较慢且准确性较低。

最后,你可以训练自己的神经网络模型来检测语音、噪音、确切的词语或者任何你想要的内容,尽管这将需要更多的时间和努力。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接