在TensorFlow中对音频信号进行滤波。

Question

在TensorFlow中对音频信号进行滤波。

pythontensorflowscipydatasetsignal-processing

7

我正在构建一个基于音频的深度学习模型。作为预处理的一部分，我想增强数据集中的音频。我想要进行的一种增强是应用RIR（房间冲激响应）函数。我正在使用Python 3.9.5和TensorFlow 2.8。

在Python中，如果给定的RIR是n个点的有限脉冲响应（FIR），标准方法是使用SciPy lfilter。

import numpy as np
from scipy import signal
import soundfile as sf

h = np.load("rir.npy")
x, fs = sf.read("audio.wav")

y = signal.lfilter(h, 1, x)

循环运行所有文件可能需要很长时间。使用TensorFlow map工具在TensorFlow数据集上进行操作：

# define filter function
def h_filt(audio, label):
    h = np.load("rir.npy")
    x = audio.numpy()
    y = signal.lfilter(h, 1, x)
    return tf.convert_to_tensor(y, dtype=tf.float32), label

# apply it via TF map on dataset
aug_ds = ds.map(h_filt)

使用 tf.numpy_function:

tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])

# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)

我有两个问题：

这种方法是否正确且足够快（50,000个文件少于一分钟）？
是否有更快的方法来完成它？例如，使用内置的TensorFlow函数替换SciPy函数。我没有找到与lfilter或SciPy's convolve等效的函数。

- Triceratops

我没有你问题的答案，但我可能有一些有用的提示：你不能在tf.dataset的map中使用TF的急切执行（.numpy()）。你需要将函数包装在tf.numpy_function中。你可能还想看看tf.nn.conv1d来进行一维卷积。 - Lescurel

当涉及到速度要求时，tf.data模型是一种流式模型，因此在训练模型时数据将被分批处理。这可能足够快，也可能不够快，具体取决于您的需求。 - Lescurel

@Lescurel 我添加了一个使用 tf.numpy_function 的示例。它是正确的吗？它能工作吗？ - Triceratops

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Bob · Accepted Answer

以下是一种处理方法：

请注意，TensorFlow函数的设计是接收具有多个通道的输入批次，并且滤波器可以具有多个输入通道和多个输出通道。假设N 是批次大小，I 是输入通道数，F 是滤波器宽度，L 是输入宽度，O 是输出通道数。使用 padding = 'SAME'，它将形状为(N, L, I)的输入和形状为(F, I, O)的滤波器映射到形状为(N, L, O)的输出。

import numpy as np
from scipy import signal
import tensorflow as tf

# data to compare the two approaches
x = np.random.randn(100)
h = np.random.randn(11)

# h
y_lfilt = signal.lfilter(h, 1, x)

# Since the denominator of your filter transfer function is 1
# the output of lfiler matches the convolution
y_np = np.convolve(h, x)
assert np.allclose(y_lfilt, y_np[:len(y_lfilt)])

# now let's do the convolution using tensorflow
y_tf = tf.nn.conv1d(
    # x must be padded with half of the size of h
    # to use padding 'SAME'
    np.pad(x, len(h) // 2).reshape(1, -1, 1), 
    # the time axis of h must be flipped
    h[::-1].reshape(-1, 1, 1), # a 1x1 matrix of filters
    stride=1, 
    padding='SAME', 
    data_format='NWC')

assert np.allclose(y_lfilt, np.squeeze(y_tf)[:len(y_lfilt)])