我一直在尝试使用Python中的Microsoft Azure语音识别服务制作字幕,但是无法弄清楚。我已经按照其他人在此处提供的提示获取了单个单词,但即使将其格式化为.srt或.vtt,也似乎很复杂。
以下是代码:
import azure.cognitiveservices.speech as speechsdk
def speech_recognize_continuous_from_file():
"""performs continuous speech recognition with input from an audio file"""
# <SpeechContinuousRecognitionWithFile>
speech_key, service_region = "{api-key}", "{serive-region}"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_filename = "{for example: video.wav}"
audio_config = speechsdk.audio.AudioConfig(filename=audio_filename)
speech_config.speech_recognition_language="en-US"
speech_config.request_word_level_timestamps()
speech_config.enable_dictation()
speech_config.output_format = speechsdk.OutputFormat(1)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
done = False
results = []
transcript = []
words = []
def handle_final_result(evt):
import json
results = json.loads(evt.result.json)
transcript.append(results['DisplayText'])
confidence_list_temp = [item.get('Confidence') for item in results['NBest']]
max_confidence_index = confidence_list_temp.index(max(confidence_list_temp))
words.extend(results['NBest'][max_confidence_index]['Words'])
def stop_cb(evt):
"""callback that stops continuous recognition upon receiving an event `evt`"""
print('CLOSING on {}'.format(evt))
speech_recognizer.stop_continuous_recognition()
nonlocal done
done = True
print("Transcript display list:\n")
print(transcript)
print("\nWords\n")
print(words)
print("\n")
speech_recognizer.recognized.connect(handle_final_result)
# Connect callbacks to the events fired by the speech recognizer
speech_recognizer.recognizing.connect(lambda evt: format(evt))
speech_recognizer.recognized.connect(lambda evt: format(evt))
speech_recognizer.session_started.connect(lambda evt: format(evt))
speech_recognizer.session_stopped.connect(lambda evt: format(evt))
speech_recognizer.canceled.connect(lambda evt: format(evt))
# stop continuous recognition on either session stopped or canceled events
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
# Start continuous speech recognition
speech_recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
with open('Azure_Raw.txt','w') as f:
f.write('\n'.join(results))
sample_long_running_recognize(storage_uri)
我在字幕方面唯一找到的其他“教程”是一个Google Cloud的,它给出了我想要的结果(是的,我已经亲测过),但Azure显然完全不像G-cloud:https://medium.com/searce/generate-srt-file-subtitles-using-google-clouds-speech-to-text-api-402b2f1da3bd
所以基本上:我如何将大约3秒的语音文本转换为.srt格式,就像这样:
1
00:00:00,000 --> 00:00:03,000
This is the first sentence that
2
00:00:03,000 --> 00:00:06,000
continues after 3 seconds or so