我希望查看Azure的语音服务的准确性,特别是使用音频文件的语音转文字。我已经阅读了https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/?view=azure-python 的文档,并尝试了MS快速入门页面上建议的代码。代码可以正常工作并能够得到一些转录,但它只转录音频的开头(第一个话语):
import azure.cognitiveservices.speech as speechsdk
speechKey = 'xxx'
service_region = 'westus'
speech_config = speechsdk.SpeechConfig(subscription=speechKey, region=service_region, speech_recognition_language="es-MX")
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=False, filename='lala.wav')
sr = speechsdk.SpeechRecognizer(speech_config, audio_config)
es = speechsdk.EventSignal(sr.recognized, sr.recognized)
result = sr.recognize_once()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
根据文档,看起来我必须使用信号和事件来捕获完整的音频,使用方法start_continuous_recognition(该方法未记录为Python,但似乎该方法和相关类已实现)。 我尝试按照C#和Java的其他示例进行操作,但无法在Python中实现。
有人能够做到这一点并提供一些指针吗? 非常感谢!
speech_recognizer.recognized.connect(handle_final_result)
放在独立的一行,而不是在上面的函数中。感谢您的帮助! - thomassantosh