从网络流式传输音频时，无法从Google语音合成API获得结果

Question

从网络流式传输音频时，无法从Google语音合成API获得结果

djangospeech-to-textgoogle-speech-apidjango-channelsgoogle-cloud-speech

33

我想使用Python的Google-cloud-speech API从Web流式传输音频并将其转换为文本。我已经在我的Django channels代码中集成了它。对于前端，我直接复制了这个代码，后端有这个代码（请参见下面）。现在，问题来了，我没有得到任何异常或错误，但是我没有从谷歌API得到任何结果。我尝试过:

我在process函数的循环内设置了调试点，但控制权从未进入循环内。
我已经阅读了这里的Java代码here并试图理解它。我已经在本地安装并调试了java代码。我理解的一件事是，在java代码中，方法onWebSocketBinary接收一个整数数组，我们从前端发送它。

  socket.send(Int16Array.from(floatSamples.map(function (n) {return n * MAX_INT;})));

在Java中，他们将其转换为bytestring然后发送到Google。而在Django中，我设置了调试点并注意到我得到的是二进制字符串数据。因此，我觉得我不需要对此进行任何处理。但是，我尝试了几种将其转换为整数数组的方法，但那行不通，因为Google期望的就是字节本身（您可以在下面的注释代码中看到）。

我阅读了这个示例代码和这个来自Google的，并且我正在做同样的事情，我不明白我在这里做错了什么。

Django代码：

import json

from channels.generic.websocket import WebsocketConsumer

# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

# Instantiates a client
client = speech.SpeechClient()
language_code = "en-US"
streaming_config = None


class SpeechToTextConsumer(WebsocketConsumer):
    def connect(self):
        self.accept()

    def disconnect(self, close_code):
        pass

    def process(self, streaming_recognize_response: types.StreamingRecognitionResult):
        for response in streaming_recognize_response:
            if not response.results:
                continue
            result = response.results[0]
            self.send(text_data=json.dumps(result))

    def receive(self, text_data=None, bytes_data=None):
        global streaming_config
        if text_data:
            data = json.loads(text_data)
            rate = data["sampleRate"]
            config = types.RecognitionConfig(
                encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
                sample_rate_hertz=rate,
                language_code=language_code,
            )
            streaming_config = types.StreamingRecognitionConfig(
                config=config, interim_results=True, single_utterance=False
            )
            types.StreamingRecognizeRequest(streaming_config=streaming_config)
            self.send(text_data=json.dumps({"message": "processing..."}))
        if bytes_data:
            # bytes_data = bytes_data[math.floor(len(bytes_data) / 2) :]
            # bytes_data = bytes_data.lstrip(b"\x00")
            # bytes_data = int.from_bytes(bytes_data, "little")
            stream = [bytes_data]
            requests = (
                types.StreamingRecognizeRequest(audio_content=chunk) for chunk in stream
            )
            responses = client.streaming_recognize(streaming_config, requests)
            self.process(responses)

- Lokesh Sanapalli

重要的是您还要发布接收WebSocket请求的服务器端代码...提供一个最小完整的服务器和客户端示例将激励更多人伸出援手...我已经创建了这样的东西，它并不太难。 - Scott Stensland

@ScottStensland 感谢您的帮助，后端代码基本上运行良好...没有问题...我只需要知道如何流式传输音频就可以了... - Lokesh Sanapalli

你如何测试后端代码？它是否能够与你的语音流一起工作？ - Ali Asgari

@Lokesh Sanapalli，你有没有找到解决方法？ - Mišel Ademi

@MišelAdemi 不，还在等待。 - Lokesh Sanapalli

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mason Choi · Accepted Answer

我在创建虚拟人工智能助手时遇到了类似的问题，认为我可以提供一些帮助。我不是专家，但我找到了一种实现Google文本转语音引擎的方法。我使用了Python的speech_recognition库（您可以使用pip install speech_recognition进行下载），并将其导入为“sr”。从这里开始，您可以使用recognize.recognize_google（audio file）来设置Google的API。您不需要账户，因为该库已经包含了一个密钥，并且非常容易设置和实现，例如Django。这是一个非常有用的链接，我建议您参考一下。这是文档的链接。这是一个有用的程序，它使用所有可用的语音识别服务获取音频文件并将其转录。以下是代码，您可以使用任何您喜欢的服务，sphinx离线运行，而Google的API不需要注册，因为它已经有了密钥和密码。

    #!/usr/bin/env python3

import speech_recognition as sr

# obtain path to "english.wav" in the same folder as this script
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")

# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file

# recognize speech using Sphinx
try:
    print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))

# recognize speech using Google Speech Recognition
try:
    # for testing purposes, we're just using the default API key
    # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
    # instead of `r.recognize_google(audio)`
    print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
except sr.UnknownValueError:
    print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))

# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
try:
    print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS))
except sr.UnknownValueError:
    print("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Cloud Speech service; {0}".format(e))

# recognize speech using Wit.ai
WIT_AI_KEY = "INSERT WIT.AI API KEY HERE"  # Wit.ai keys are 32-character uppercase alphanumeric strings
try:
    print("Wit.ai thinks you said " + r.recognize_wit(audio, key=WIT_AI_KEY))
except sr.UnknownValueError:
    print("Wit.ai could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Wit.ai service; {0}".format(e))

# recognize speech using Microsoft Azure Speech
AZURE_SPEECH_KEY = "INSERT AZURE SPEECH API KEY HERE"  # Microsoft Speech API keys 32-character lowercase hexadecimal strings
try:
    print("Microsoft Azure Speech thinks you said " + r.recognize_azure(audio, key=AZURE_SPEECH_KEY))
except sr.UnknownValueError:
    print("Microsoft Azure Speech could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Microsoft Azure Speech service; {0}".format(e))

# recognize speech using Microsoft Bing Voice Recognition
BING_KEY = "INSERT BING API KEY HERE"  # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings
try:
    print("Microsoft Bing Voice Recognition thinks you said " + r.recognize_bing(audio, key=BING_KEY))
except sr.UnknownValueError:
    print("Microsoft Bing Voice Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))

# recognize speech using Houndify
HOUNDIFY_CLIENT_ID = "INSERT HOUNDIFY CLIENT ID HERE"  # Houndify client IDs are Base64-encoded strings
HOUNDIFY_CLIENT_KEY = "INSERT HOUNDIFY CLIENT KEY HERE"  # Houndify client keys are Base64-encoded strings
try:
    print("Houndify thinks you said " + r.recognize_houndify(audio, client_id=HOUNDIFY_CLIENT_ID, client_key=HOUNDIFY_CLIENT_KEY))
except sr.UnknownValueError:
    print("Houndify could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Houndify service; {0}".format(e))

# recognize speech using IBM Speech to Text
IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE"  # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE"  # IBM Speech to Text passwords are mixed-case alphanumeric strings
try:
    print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD))
except sr.UnknownValueError:
    print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
    print("Could not request results from IBM Speech to Text service; {0}".format(e))

希望这在某种程度上有所帮助！