Azure语音转文字 - 连续识别

4
我希望查看Azure的语音服务的准确性,特别是使用音频文件的语音转文字。我已经阅读了https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/?view=azure-python 的文档,并尝试了MS快速入门页面上建议的代码。代码可以正常工作并能够得到一些转录,但它只转录音频的开头(第一个话语):
import azure.cognitiveservices.speech as speechsdk

speechKey = 'xxx'
service_region = 'westus'

speech_config = speechsdk.SpeechConfig(subscription=speechKey, region=service_region, speech_recognition_language="es-MX")
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=False, filename='lala.wav')

sr = speechsdk.SpeechRecognizer(speech_config, audio_config)

es = speechsdk.EventSignal(sr.recognized, sr.recognized)

result = sr.recognize_once()

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

根据文档,看起来我必须使用信号和事件来捕获完整的音频,使用方法start_continuous_recognition(该方法未记录为Python,但似乎该方法和相关类已实现)。 我尝试按照C#和Java的其他示例进行操作,但无法在Python中实现。

有人能够做到这一点并提供一些指针吗? 非常感谢!

4个回答

3

请查看Azure Python示例:https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py

或其他语言示例:https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples

基本上如下:

def speech_recognize_continuous_from_file():
    """performs continuous speech recognition with input from an audio file"""
    # <SpeechContinuousRecognitionWithFile>
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)

    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    done = False

    def stop_cb(evt):
        """callback that stops continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        speech_recognizer.stop_continuous_recognition()
        nonlocal done
        done = True

    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)

    # Start continuous speech recognition
    speech_recognizer.start_continuous_recognition()
    while not done:
        time.sleep(.5)
    # </SpeechContinuousRecognitionWithFile>

2

为了进一步改进@manyways的解决方案,这里提供了收集数据的方法。

all_results = []

def handle_final_result(evt):
    all_results.append(evt.result.text)
    speech_recognizer.recognized.connect(handle_final_result)  # to collect data at the end

1
仅对此评论进行扩展,有效的方法是在回调函数中将 speech_recognizer.recognized.connect(handle_final_result) 放在独立的一行,而不是在上面的函数中。感谢您的帮助! - thomassantosh

1
你可以尝试这个:

你可以尝试这个:

import azure.cognitiveservices.speech as speechsdk
import time
speech_key, service_region = "xyz", "WestEurope"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region, speech_recognition_language="it-IT")
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('\nSESSION STOPPED {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('\n{}'.format(evt.result.text)))

print('Say a few words\n\n')
speech_recognizer.start_continuous_recognition()
time.sleep(10)
speech_recognizer.stop_continuous_recognition()

speech_recognizer.session_started.disconnect_all()
speech_recognizer.recognized.disconnect_all()
speech_recognizer.session_stopped.disconnect_all()

记得设置你的首选语言。这并不算太多,但它是一个好的起点,并且它有效。我将继续尝试。


0

为了进一步帮助@David Beauchemin的解决方案,以下代码块适用于我以整洁的列表形式获取最终结果:

speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING:{}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED:{}'.format(evt)))
all_results = []
def handle_final_result(evt):
    all_results.append(evt.result.text)
speech_recognizer.recognized.connect(handle_final_result)
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED:{}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))

speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接