我试图编写一个方法来转换字幕文件,以使每个字幕中只有一个句子。
我的想法如下:
- 对于每个字幕:
1.1->我获取字幕持续时间
1.2->计算每秒字符数
1.3->使用此来存储(在dict_times_word_subtitle
中)发音单词 i 所需的时间
从整个文本中提取句子
对于每个句子:
3.1 我存储(在dict_sentences_subtitle
中)说出具体单词所需的时间,以计算说出该句子需要的时间
- 我创建一个新的srt文件(字幕文件),它与原始srt文件同时开始,并且字幕时间可以从说出这些句子所需的持续时间中获得。
目前为止,我已经编写了以下代码:
#---------------------------------------------------------
import pysrt
import re
from datetime import datetime, date, time, timedelta
#---------------------------------------------------------
def convert_subtitle_one_sentence(file_name):
sub = pysrt.open(file_name)
### ----------------------------------------------------------------------
### Store Each Word and the Average Time it Takes to Say it in a dictionary
### ----------------------------------------------------------------------
dict_times_word_subtitle = {}
running_variable = 0
for i in range(len(sub)):
subtitle_text = sub[i].text
subtitle_duration = (datetime.combine(date.min, sub[i].duration.to_time()) - datetime.min).total_seconds()
# Compute characters per second
characters_per_second = len(subtitle_text)/subtitle_duration
# Store Each Word and the Average Time (seconds) it Takes to Say in a Dictionary
for j,word in enumerate(subtitle_text.split()):
if j == len(subtitle_text.split())-1:
time = len(word)/characters_per_second
else:
time = len(word+" ")/characters_per_second
dict_times_word_subtitle[str(running_variable)] = [word, time]
running_variable += 1
### ----------------------------------------------------------------------
### Store Each Sentence and the Average Time to Say it in a Dictionary
### ----------------------------------------------------------------------
total_number_of_words = len(dict_times_word_subtitle.keys())
# Get the entire text
entire_text = ""
for i in range(total_number_of_words):
entire_text += dict_times_word_subtitle[str(i)][0] +" "
# Initialize the dictionary
dict_times_sentences_subtitle = {}
# Loop through all found sentences
last_number_of_words = 0
for i,sentence in enumerate(re.findall(r'([A-Z][^\.!?]*[\.!?])', entire_text)):
number_of_words = len(sentence.split())
# Compute the time it takes to speak the sentence
time_sentence = 0
for j in range(last_number_of_words, last_number_of_words + number_of_words):
time_sentence += dict_times_word_subtitle[str(j)][1]
# Store the sentence together with the time it takes to say the sentence
dict_times_sentences_subtitle[str(i)] = [sentence, round(time_sentence,3)]
## Update last number_of_words
last_number_of_words += number_of_words
# Check if there is a non-sentence remaining at the end
if j < total_number_of_words:
remaining_string = ""
remaining_string_time = 0
for k in range(j+1, total_number_of_words):
remaining_string += dict_times_word_subtitle[str(k)][0] + " "
remaining_string_time += dict_times_word_subtitle[str(k)][1]
dict_times_sentences_subtitle[str(i+1)] = [remaining_string, remaining_string_time]
### ----------------------------------------------------------------------
### Create a new Subtitle file with only 1 sentence at a time
### ----------------------------------------------------------------------
# Initalize new srt file
new_srt = pysrt.SubRipFile()
# Loop through all sentence
# get initial start time (seconds)
# https://dev59.com/U1cP5IYBdhLWcg3wH28Z
start_time = (datetime.combine(date.min, sub[0].start.to_time()) - datetime.min).total_seconds()
for i in range(len(dict_times_sentences_subtitle.keys())):
sentence = dict_times_sentences_subtitle[str(i)][0]
print(sentence)
time_sentence = dict_times_sentences_subtitle[str(i)][1]
print(time_sentence)
item = pysrt.SubRipItem(
index=i,
start=pysrt.SubRipTime(seconds=start_time),
end=pysrt.SubRipTime(seconds=start_time+time_sentence),
text=sentence)
new_srt.append(item)
## Update Start Time
start_time += time_sentence
new_srt.save(file_name)
问题:
没有出现错误提示,但是当我将此应用于实际的字幕文件并观看视频时,字幕开头是正确的,但随着视频的进度(错误的进度),字幕与实际所说的越来越不对齐。
例如:演讲者已经结束了他的讲话,但字幕仍然继续出现。
测试简单示例
srt = """
1
00:00:13,100 --> 00:00:14,750
Dr. Martin Luther King, Jr.,
2
00:00:14,750 --> 00:00:18,636
in a 1968 speech where he reflects
upon the Civil Rights Movement,
3
00:00:18,636 --> 00:00:21,330
states, "In the end,
4
00:00:21,330 --> 00:00:24,413
we will remember not the words of our enemies
5
00:00:24,413 --> 00:00:27,280
but the silence of our friends."
6
00:00:27,280 --> 00:00:29,800
As a teacher, I've internalized this message.
"""
with open('test.srt', "w") as file:
file.write(srt)
convert_subtitle_one_sentence("test.srt")
输出结果如下(是的,还需要在句子识别部分进行一些工作(即 Dr.)):
正如你所看到的,原始时间戳为0 00:00:13,100 --> 00:00:13,336 Dr. 1 00:00:13,336 --> 00:00:14,750 Martin Luther King, Jr. 2 00:00:14,750 --> 00:00:23,514 Civil Rights Movement, states, "In the end, we will remember not the words of our enemies but the silence of our friends. 3 00:00:23,514 --> 00:00:26,175 As a teacher, I've internalized this message. 4 00:00:26,175 --> 00:00:29,859 our friends." As a teacher, I've internalized this message.
00:00:29,800
,而输出文件中为00:00:29,859
。一开始可能不太明显,但随着视频长度的增加,差异会逐渐增加。完整的示例视频可以从此处下载:https://ufile.io/19nuvqb3 完整的字幕文件:https://ufile.io/qracb7ai 注意:字幕文件将被覆盖,因此您可能希望使用其他名称存储副本以进行比较。
修复方法:
已知起始或结束的单词在原始字幕中具有精确的时间。这可以用于交叉检查并相应地调整时间。
编辑:
以下是创建字典的代码,该字典存储字符、字符持续时间(字幕平均值)和开始或结束原始时间戳(如果存在)。
sub = pysrt.open('video.srt')
running_variable = 0
dict_subtitle = {}
for i in range(len(sub)):
# Extract Start Time Stamb
timestamb_start = sub[i].start
# Extract Text
text =sub[i].text
# Extract End Time Stamb
timestamb_end = sub[i].end
# Extract Characters per Second
characters_per_second = sub[i].characters_per_second
# Fill Dictionary
for j,character in enumerate(" ".join(text.split())):
character_duration = len(character)*characters_per_second
dict_subtitle[str(running_variable)] = [character,character_duration,False, False]
if j == 0: dict_subtitle[str(running_variable)] = [character, character_duration, timestamb_start, False]
if j == len(text)-1 : dict_subtitle[str(running_variable)] = [character, character_duration, False, timestamb_end]
running_variable += 1
更多可供尝试的视频
在这里,您可以下载更多视频及其相应的字幕文件: https://filebin.net/kwygjffdlfi62pjs
编辑3
4
00:00:18,856 --> 00:00:25,904
Je rappelle la définition de ce qu'est un produit scalaire, <i>dot product</i> dans <i>Ⅎ</i>.
5
00:00:24,855 --> 00:00:30,431
Donc je prends deux vecteurs dans <i>Ⅎ</i> et je définis cette opération-là , linéaire, <i>u