如何在Pyaudio回调模式下处理in_data？

Question

如何在Pyaudio回调模式下处理in_data？

11

我正在用Python进行信号处理项目。到目前为止，我已经在非阻塞模式下取得了一些成功，但是输出中出现了相当多的延迟和剪裁。

我想使用Pyaudio和Scipy.Signal实现一个简单的实时音频滤波器，但是在Pyaudio示例提供的回调函数中，当我想要读取in_data时，我无法处理它。尝试了各种转换方式，但都没有成功。

这里是我想要实现的代码（从麦克风读取数据，进行滤波，并尽快输出）：

import pyaudio
import time
import numpy as np
import scipy.signal as signal
WIDTH = 2
CHANNELS = 2
RATE = 44100

p = pyaudio.PyAudio()
b,a=signal.iirdesign(0.03,0.07,5,40)
fulldata = np.array([])

def callback(in_data, frame_count, time_info, status):
    data=signal.lfilter(b,a,in_data)
    return (data, pyaudio.paContinue)

stream = p.open(format=pyaudio.paFloat32,
                channels=CHANNELS,
                rate=RATE,
                output=True,
                input=True,
                stream_callback=callback)

stream.start_stream()

while stream.is_active():
    time.sleep(5)
    stream.stop_stream()
stream.close()

p.terminate()

什么是正确的做法？

- function_store

2个回答

1

我曾经遇到类似的问题，尝试使用PyAudio回调模式进行编程，但我的要求是：

使用立体声输出（2个通道）。
实时处理。
使用任意脉冲响应处理输入信号，该响应可能在处理过程中发生变化。

经过几次尝试，我成功了，以下是我的代码片段（基于在这里找到的PyAudio示例）：

import pyaudio
import scipy.signal as ss
import numpy as np
import librosa   



track1_data, track1_rate = librosa.load('path/to/wav/track1', sr=44.1e3, dtype=np.float64)
track2_data, track2_rate = librosa.load('path/to/wav/track2', sr=44.1e3, dtype=np.float64)
track3_data, track3_rate = librosa.load('path/to/wav/track3', sr=44.1e3, dtype=np.float64)

# instantiate PyAudio (1)
p = pyaudio.PyAudio()
count = 0
IR_left = first_IR_left # Replace for actual IR
IR_right = first_IR_right # Replace for actual IR

# define callback (2)
def callback(in_data, frame_count, time_info, status):
    global count

    track1_frame = track1_data[frame_count*count : frame_count*(count+1)]
    track2_frame = track2_data[frame_count*count : frame_count*(count+1)]
    track3_frame = track3_data[frame_count*count : frame_count*(count+1)]

    track1_left = ss.fftconvolve(track1_frame, IR_left)
    track1_right = ss.fftconvolve(track1_frame, IR_right)
    track2_left = ss.fftconvolve(track2_frame, IR_left)
    track2_right = ss.fftconvolve(track2_frame, IR_right)
    track3_left = ss.fftconvolve(track3_frame, IR_left)
    track3_right = ss.fftconvolve(track3_frame, IR_right)

    track_left = 1/3 * track1_left + 1/3 * track2_left + 1/3 * track3_left
    track_right = 1/3 * track1_right + 1/3 * track2_right + 1/3 * track3_right

    ret_data = np.empty((track_left.size + track_right.size), dtype=track1_left.dtype)
    ret_data[1::2] = br_left
    ret_data[0::2] = br_right
    ret_data = ret_data.astype(np.float32).tostring()
    count += 1
    return (ret_data, pyaudio.paContinue)

# open stream using callback (3)
stream = p.open(format=pyaudio.paFloat32,
                channels=2,
                rate=int(track1_rate),
                output=True,
                stream_callback=callback,
                frames_per_buffer=2**16)

# start the stream (4)
stream.start_stream()

# wait for stream to finish (5)
while_count = 0
while stream.is_active():
    while_count += 1
    if while_count % 3 == 0:
        IR_left = first_IR_left # Replace for actual IR
        IR_right = first_IR_right # Replace for actual IR
    elif while_count % 3 == 1:
        IR_left = second_IR_left # Replace for actual IR
        IR_right = second_IR_right # Replace for actual IR
    elif while_count % 3 == 2:
        IR_left = third_IR_left # Replace for actual IR
        IR_right = third_IR_right # Replace for actual IR

    time.sleep(10)

# stop stream (6)
stream.stop_stream()
stream.close()

# close PyAudio (7)
p.terminate()

以下是关于上述代码的一些重要反思：

使用librosa而不是wave允许我使用numpy数组进行处理，这比wave.readframes中的数据块要好得多。
p.open(format=中设置的数据类型必须与ret_data字节的格式匹配。PyAudio最多使用float32。
ret_data中的偶数索引字节进入右耳机，奇数索引字节进入左耳机。

只是为了澄清，此代码将三个轨道的混合发送到立体声输出音频中，并且每10秒更改脉冲响应和因此应用的滤波器。我使用它来测试我正在开发的3D音频应用程序，因此脉冲响应是头部相关脉冲响应（HRIR），每10秒更改一次声音位置。

编辑：
这段代码存在问题：输出带有噪声，频率与帧大小相对应（帧大小越小，频率越高）。我通过手动重叠和添加帧来解决了这个问题。基本上，ss.oaconvolve 返回一个大小为 track_frame.size + IR.size - 1 的数组，所以我将该数组分成了前 track_frame.size 个元素（然后用于 ret_data），然后最后 IR.size - 1 个元素被保存以备后用。这些保存的元素将会被添加到下一帧的前 IR.size - 1 个元素中。第一帧添加零。

- Facundo Farall

有没有可能获取完整的代码？我会觉得非常有用。 - Mattia Surricchio

当然可以！这里是我使用它的GitHub存储库链接。由于项目最终走了不同的方向，所以有点杂乱无章，但在那个文件夹中，您会找到一个名为convolutioner.py的文件，它执行处理操作，以及一个测试文件，在其中我使用Convolutioner使用HRIR作为脉冲响应来空间化音频。 - Facundo Farall

Farrall，这看起来是一项非常有趣的工作。我可以在哪里添加您/写信给您吗？我正在撰写我的硕士论文，我认为您的代码对我非常有用（如果我可以使用它，当然要进行适当引用）。 - Mattia Surricchio

没问题，你可以通过LinkedIn联系我。 - Facundo Farall

顺便说一下，我尝试运行这段代码（删除所有不必要的处理，如fft等），只是为了复制一个简单的输入音频文件，但它似乎无法工作。回调函数仅被调用一次，然后程序停止。我不知道问题出在哪里。 - Mattia Surricchio

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- function_store · Accepted Answer

与此同时，我找到了我的问题答案，回调函数如下：

def callback(in_data, frame_count, time_info, flag):
    global b,a,fulldata #global variables for filter coefficients and array
    audio_data = np.fromstring(in_data, dtype=np.float32)
    #do whatever with data, in my case I want to hear my data filtered in realtime
    audio_data = signal.filtfilt(b,a,audio_data,padlen=200).astype(np.float32).tostring()
    fulldata = np.append(fulldata,audio_data) #saves filtered data in an array
    return (audio_data, pyaudio.paContinue)