如何将pyaudio帧转换为wav格式而不必写入文件？

Question

如何将pyaudio帧转换为wav格式而不必写入文件？

9

我希望使用pyaudio和IBM Bluemix服务实现简单的语音转文字工具。目前，我需要录制音频，将其保存到磁盘，然后重新加载，以便发送到Bluemix。

RATE=44100
RECORD_SECONDS = 10
CHUNKSIZE = 1024

# initialize portaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=RATE,     input=True, frames_per_buffer=CHUNKSIZE)

frames = [] # A python-list of chunks(numpy.ndarray)
print("Please speak!")

for _ in range(0, int(RATE / CHUNKSIZE * RECORD_SECONDS)):
    data = stream.read(CHUNKSIZE)
    frames.append(np.fromstring(data, dtype=np.int16))

#Convert the list of numpy-arrays into a 1D array (column-wise)
numpydata = np.hstack(frames)

# close stream
stream.stop_stream()
stream.close()
p.terminate()

# save audio to disk
wav.write('out.wav',RATE,numpydata)

# Open audio file(.wav) in wave format 
audio = open('/home/dolorousrtur/Documents/Projects/Capstone/out.wav', 'rb') 

# send audio to bluemix service
headers={'Content-Type': 'audio/wav'} 
r = requests.post(url, data=audio, headers=headers, auth=(username, password))

如何将pyaudio的音频帧转换为wav格式，而无需将它们写入磁盘？

- Arthur Grigorev

1

我找到了可以实现这个功能的代码。在这里可以找到'AudioData'类：https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/__init__.py，它有一个名为get_wav_data()的方法，可以返回转换为wav格式的对象。 - Arthur Grigorev

如果解决方案有效，您能否将其添加为答案？ - Ananth Kamath

2个回答

0

我认为你可以使用一个BytesIO对象将数据写入内存文件中。

import io

with io.BytesIO() as wav_file:
    wav_writer = wave.open(wav_file, "wb")
    try:
        wav_writer.setframerate(sample_rate)
        wav_writer.setsampwidth(sample_width)
        wav_writer.setnchannels(1)
        wav_writer.writeframes(raw_data)
        wav_data = wav_file.getvalue()
    finally:
        wav_writer.close()

使用@Adrian Pope的回答，我从speech_recognition获得了这段代码，请参见此处。该库采用BSD许可证，在特定条件下允许重新使用。

我还没有测试过这个。

- dfrankow

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Adrian Pope · Accepted Answer

这是一个对我有效的示例。如果您将录制的音频放入speech_recognition的AudioData对象中，可以使用各种音频格式转换方法（例如get_wav_data()，get_aiff_data()，get_flac_data()等）。请参见此处：speech_recognition AudioData。

import pyaudio
import speech_recognition
from time import sleep


class Recorder():

    sampling_rate = 44100
    num_channels = 2
    sample_width = 4 # The width of each sample in bytes. Each group of ``sample_width`` bytes represents a single audio sample. 

    def pyaudio_stream_callback(self, in_data, frame_count, time_info, status):
        self.raw_audio_bytes_array.extend(in_data)
        return (in_data, pyaudio.paContinue)

    def start_recording(self):

        self.raw_audio_bytes_array = bytearray()

        pa = pyaudio.PyAudio()
        self.pyaudio_stream = pa.open(format=pyaudio.paInt16,
                                      channels=self.num_channels,
                                      rate=self.sampling_rate,
                                      input=True,
                                      stream_callback=self.pyaudio_stream_callback)

        self.pyaudio_stream.start_stream()

    def stop_recording(self):

        self.pyaudio_stream.stop_stream()
        self.pyaudio_stream.close()

        speech_recognition_audio_data = speech_recognition.AudioData(self.raw_audio_bytes_array,
                                                                     self.sampling_rate,
                                                                     self.sample_width)
        return speech_recognition_audio_data


if __name__ == '__main__':

    recorder = Recorder()

    # start recording
    recorder.start_recording()

    # say something interesting...
    sleep(3)

    # stop recording
    speech_recognition_audio_data = recorder.stop_recording()

    # convert the audio represented by the ``AudioData`` object to
    # a byte string representing the contents of a WAV file
    wav_data = speech_recognition_audio_data.get_wav_data()