如何将字幕文件转换为每个字幕仅包含一句话？

Question

如何将字幕文件转换为每个字幕仅包含一句话？

pythonregexpython-3.xregex-greedysubtitle

7

我试图编写一个方法来转换字幕文件，以使每个字幕中只有一个句子。

我的想法如下：

对于每个字幕：

1.1->我获取字幕持续时间

1.2->计算每秒字符数

1.3->使用此来存储（在dict_times_word_subtitle中）发音单词 i 所需的时间

从整个文本中提取句子
对于每个句子：

3.1 我存储（在dict_sentences_subtitle中）说出具体单词所需的时间，以计算说出该句子需要的时间

我创建一个新的srt文件（字幕文件），它与原始srt文件同时开始，并且字幕时间可以从说出这些句子所需的持续时间中获得。

目前为止，我已经编写了以下代码：

#---------------------------------------------------------
import pysrt
import re
from datetime import datetime, date, time, timedelta
#---------------------------------------------------------

def convert_subtitle_one_sentence(file_name):
    
    sub = pysrt.open(file_name)   

    ### ----------------------------------------------------------------------
    ### Store Each Word and the Average Time it Takes to Say it in a dictionary
    ### ----------------------------------------------------------------------

    dict_times_word_subtitle = {}
    running_variable = 0
    for i in range(len(sub)):

        subtitle_text = sub[i].text
        subtitle_duration = (datetime.combine(date.min, sub[i].duration.to_time()) - datetime.min).total_seconds()

        # Compute characters per second
        characters_per_second = len(subtitle_text)/subtitle_duration

        # Store Each Word and the Average Time (seconds) it Takes to Say in a Dictionary 
        
        for j,word in enumerate(subtitle_text.split()):
            if j == len(subtitle_text.split())-1:
                time = len(word)/characters_per_second
            else:
                time = len(word+" ")/characters_per_second

            dict_times_word_subtitle[str(running_variable)] = [word, time]
            running_variable += 1

            
    ### ----------------------------------------------------------------------
    ### Store Each Sentence and the Average Time to Say it in a Dictionary
    ### ----------------------------------------------------------------------  

    total_number_of_words = len(dict_times_word_subtitle.keys())

    # Get the entire text
    entire_text = ""
    for i in range(total_number_of_words):
        entire_text += dict_times_word_subtitle[str(i)][0] +" "


    # Initialize the dictionary 
    dict_times_sentences_subtitle = {}

    # Loop through all found sentences 
    last_number_of_words = 0
    for i,sentence in enumerate(re.findall(r'([A-Z][^\.!?]*[\.!?])', entire_text)):

        number_of_words = len(sentence.split())

        # Compute the time it takes to speak the sentence
        time_sentence = 0
        for j in range(last_number_of_words, last_number_of_words + number_of_words):
            time_sentence += dict_times_word_subtitle[str(j)][1] 

        # Store the sentence together with the time it takes to say the sentence
        dict_times_sentences_subtitle[str(i)] = [sentence, round(time_sentence,3)]

        ## Update last number_of_words
        last_number_of_words += number_of_words

    # Check if there is a non-sentence remaining at the end
    if j < total_number_of_words:
        remaining_string = ""
        remaining_string_time = 0
        for k in range(j+1, total_number_of_words):
            remaining_string += dict_times_word_subtitle[str(k)][0] + " "
            remaining_string_time += dict_times_word_subtitle[str(k)][1]

        dict_times_sentences_subtitle[str(i+1)] = [remaining_string, remaining_string_time]

    ### ----------------------------------------------------------------------
    ### Create a new Subtitle file with only 1 sentence at a time
    ### ----------------------------------------------------------------------  

    # Initalize new srt file
    new_srt = pysrt.SubRipFile()

    # Loop through all sentence
    # get initial start time (seconds)
    # https://dev59.com/U1cP5IYBdhLWcg3wH28Z
    start_time = (datetime.combine(date.min, sub[0].start.to_time()) - datetime.min).total_seconds()

    for i in range(len(dict_times_sentences_subtitle.keys())):


        sentence = dict_times_sentences_subtitle[str(i)][0]
        print(sentence)
        time_sentence = dict_times_sentences_subtitle[str(i)][1]
        print(time_sentence)
        item = pysrt.SubRipItem(
                        index=i,
                        start=pysrt.SubRipTime(seconds=start_time),
                        end=pysrt.SubRipTime(seconds=start_time+time_sentence),
                        text=sentence)

        new_srt.append(item)

        ## Update Start Time
        start_time += time_sentence

    new_srt.save(file_name)

问题：

没有出现错误提示，但是当我将此应用于实际的字幕文件并观看视频时，字幕开头是正确的，但随着视频的进度（错误的进度），字幕与实际所说的越来越不对齐。

例如：演讲者已经结束了他的讲话，但字幕仍然继续出现。

测试简单示例

srt = """
1
00:00:13,100 --> 00:00:14,750
Dr. Martin Luther King, Jr.,

2
00:00:14,750 --> 00:00:18,636
in a 1968 speech where he reflects
upon the Civil Rights Movement,

3
00:00:18,636 --> 00:00:21,330
states, "In the end,

4
00:00:21,330 --> 00:00:24,413
we will remember not the words of our enemies

5
00:00:24,413 --> 00:00:27,280
but the silence of our friends."

6
00:00:27,280 --> 00:00:29,800
As a teacher, I've internalized this message.

"""

with open('test.srt', "w") as file:
    file.write(srt)
    
    
convert_subtitle_one_sentence("test.srt")

输出结果如下（是的，还需要在句子识别部分进行一些工作（即 Dr.））：

0
00:00:13,100 --> 00:00:13,336
Dr.

1
00:00:13,336 --> 00:00:14,750
Martin Luther King, Jr.

2
00:00:14,750 --> 00:00:23,514
Civil Rights Movement, states, "In the end, we will remember not the words of our enemies but the silence of our friends.

3
00:00:23,514 --> 00:00:26,175
As a teacher, I've internalized this message.

4
00:00:26,175 --> 00:00:29,859
our friends." As a teacher, I've internalized this message.

正如你所看到的，原始时间戳为00:00:29,800，而输出文件中为00:00:29,859。一开始可能不太明显，但随着视频长度的增加，差异会逐渐增加。

完整的示例视频可以从此处下载：https://ufile.io/19nuvqb3 完整的字幕文件：https://ufile.io/qracb7ai 注意：字幕文件将被覆盖，因此您可能希望使用其他名称存储副本以进行比较。

修复方法：

已知起始或结束的单词在原始字幕中具有精确的时间。这可以用于交叉检查并相应地调整时间。

编辑：

以下是创建字典的代码，该字典存储字符、字符持续时间（字幕平均值）和开始或结束原始时间戳（如果存在）。

sub = pysrt.open('video.srt')

running_variable = 0
dict_subtitle = {}

for i in range(len(sub)):

    # Extract Start Time Stamb
    timestamb_start = sub[i].start

    # Extract Text
    text =sub[i].text

    # Extract End Time Stamb
    timestamb_end = sub[i].end

    # Extract Characters per Second 
    characters_per_second = sub[i].characters_per_second
    
    # Fill Dictionary 
    for j,character in enumerate(" ".join(text.split())):
        character_duration = len(character)*characters_per_second
        dict_subtitle[str(running_variable)] = [character,character_duration,False, False]
        if j == 0: dict_subtitle[str(running_variable)] = [character, character_duration, timestamb_start, False]
        if j == len(text)-1 : dict_subtitle[str(running_variable)] = [character, character_duration, False, timestamb_end]
        running_variable += 1

编辑3

4
00:00:18,856 --> 00:00:25,904
Je rappelle la dÃ©finition de ce qu'est un produit scalaire, <i>dot product</i> dans <i>â„Â²</i>.

5
00:00:24,855 --> 00:00:30,431
Donc je prends deux vecteurs dans <i>â„Â²</i> et je dÃ©finis cette opÃ©ration-lÃ , linÃ©aire, <i>u

- henry

@tobias_k 我重新发布了我的问题，并提供了一个简单的例子。请看一下，谢谢。 - henry

请查看我的新编辑答案。希望这是最终版本。 - Rolf of Saxony

我已经在我的第一个答案中添加了一些内容，即使只是将其修改为被接受的答案，您可能会发现它很有用。它确实对于您提供的那个法语数学字幕文件有所帮助。 - Rolf of Saxony

@henry：如果句子之间有非常长的间隔，否则字幕将在整个非语言持续时间内尴尬地保持显示状态，请记得人为地修剪结束时间（通过在两个句子之间添加另一个虚拟条目）。 - RARE Kpop Manifesto

2个回答

2

我已经按要求重新编写了代码，使用了pysrt包和少量的re库。思路是基于开始时间构建一个字典。

如果开始时间存在，则将数据添加到该时间的条目中，但同时更新结束时间，因此文本随着时间推移而继续进行。

如果没有开始时间，则只是一个新的字典条目。

只有在我们知道已完成一个句子后，才会提前开始时间。

因此，实质上，我们从固定的开始时间开始构建一个句子。通过添加更多文本并更新结束时间来继续构建句子，直到句子完成。在这里，我们使用当前记录来提前开始时间，我们知道这是一个新的句子。

具有多个句子的子标题条目被分割，使用pysrt的整个子标题条目的character_per_second进行计算，然后再进行计算起始和结束时间。

最后，从字典中的条目向磁盘写入新的子标题文件。

显然，由于只有一个文件可以使用，我可能会错过一些子标题布局上的问题，但至少它能为您提供一个工作的起点。

代码中都有注释，因此大多数事情应该很清楚，了解如何以及为什么这样做。

编辑：我已经完善了检查现有字典开始时间和更改用于判断句子是否结束的方法，即在拆分后将句点放回文本中。您提到的第二个视频的字幕确实有些不准确，一开始就注意到没有任何毫秒值。

以下代码对第二个视频做得还可以，对第一个视频做得很好。

编辑2：添加连续句点和HTML的< >标签删除

编辑3：结果发现pysrt从计算每秒字符数时删除了HTML标记。我也这样做了，这意味着可以在字幕中保留格式。

编辑4：该版本应对了包含数学和化学公式、IP地址等的全停顿问题。基本上是全停顿不表示完全停顿的地方。它还允许以？和！结尾的句子。

import pysrt
import re

abbreviations = ['Dr.','Mr.','Mrs.','Ms.','etc.','Jr.','e.g.'] # You get the idea!
abbrev_replace = ['Dr','Mr','Mrs','Ms','etc','Jr','eg']
subs = pysrt.open('new.srt')
subs_dict = {}          # Dictionary to accumulate new sub-titles (start_time:[end_time,sentence])
start_sentence = True   # Toggle this at the start and end of sentences

# regex to remove html tags from the character count
tags = re.compile(r'<.*?>')

# regex to split on ".", "?" or "!" ONLY if it is preceded by something else
# which is not a digit and is not a space. (Not perfect but close enough)
# Note: ? and ! can be an issue in some languages (e.g. french) where both ? and !
# are traditionally preceded by a space ! rather than!
end_of_sentence = re.compile(r'([^\s\0-9][\.\?\!])')

# End of sentence characters
eos_chars = set([".","?","!"])

for sub in subs:
    if start_sentence:
        start_time = sub.start
        start_sentence = False
    text = sub.text

    #Remove multiple full-stops e.g. "and ....."
    text = re.sub('\.+', '.', text)

    # Optional
    for idx, abr in enumerate(abbreviations):
        if abr in text:
            text = text.replace(abr,abbrev_replace[idx])
    # A test could also be made for initials in names i.e. John E. Rotten - showing my age there ;)

    multi = re.split(end_of_sentence,text.strip())
    cps = sub.characters_per_second

    # Test for a sub-title with multiple sentences
    if len(multi) > 1:
        # regex end_of_sentence breaks sentence start and sentence end into 2 parts
        # we need to put them back together again.
        # hence the odd range because the joined end part is then deleted
        for cnt in range(divmod(len(multi),2)[0]): # e.g. len=3 give 0 | 5 gives 0,1  | 7 gives 0,1,2
            multi[cnt] = multi[cnt] + multi[cnt+1]
            del multi[cnt+1]

        for part in multi:
            if len(part): # Avoid blank parts
                pass
            else:
                continue
            # Convert start time to seconds
            h,m,s,milli = re.split(':|,',str(start_time))
            s_time = (3600*int(h))+(60*int(m))+int(s)+(int(milli)/1000)

            # test for existing data
            try:
                existing_data = subs_dict[str(start_time)]
                end_time = str(existing_data[0])
                h,m,s,milli = re.split(':|,',str(existing_data[0]))
                e_time = (3600*int(h))+(60*int(m))+int(s)+(int(milli)/1000)
            except:
                existing_data = []
                e_time = s_time

            # End time is the start time or existing end time + the time taken to say the current words
            # based on the calculated number of characters per second
            # use regex "tags" to remove any html tags from the character count.

            e_time = e_time + len(tags.sub('',part)) / cps

            # Convert start to a timestamp
            s,milli = divmod(s_time,1)
            m,s = divmod(int(s),60)
            h,m = divmod(m,60)
            start_time = "{:02d}:{:02d}:{:02d},{:03d}".format(h,m,s,round(milli*1000))

            # Convert end to a timestamp
            s,milli = divmod(e_time,1)
            m,s = divmod(int(s),60)
            h,m = divmod(m,60)
            end_time = "{:02d}:{:02d}:{:02d},{:03d}".format(h,m,s,round(milli*1000))

            # if text already exists add the current text to the existing text
            # if not use the current text to write/rewrite the dictionary entry
            if existing_data:
                new_text = existing_data[1] + " " + part
            else:
                new_text = part
            subs_dict[str(start_time)] = [end_time,new_text]

            # if sentence ends re-set the current start time to the end time just calculated
            if any(x in eos_chars for x in part):
                start_sentence = True
                start_time = end_time
                print ("Split",start_time,"-->",end_time,)
                print (new_text)
                print('\n')
            else:
                start_sentence = False

    else:   # This is Not a multi-part sub-title

        end_time = str(sub.end)

        # Check for an existing dictionary entry for this start time
        try:
            existing_data = subs_dict[str(start_time)]
        except:
            existing_data = []

        # if it already exists add the current text to the existing text
        # if not use the current text
        if existing_data:
            new_text = existing_data[1] + " " + text
        else:
            new_text = text
        # Create or Update the dictionary entry for this start time
        # with the updated text and the current end time
        subs_dict[str(start_time)] = [end_time,new_text]

        if any(x in eos_chars for x in text):
            start_sentence = True
            print ("Single",start_time,"-->",end_time,)
            print (new_text)
            print('\n')
        else:
            start_sentence = False

# Generate the new sub-title file from the dictionary
idx=0
outfile = open('video_new.srt','w')
for key, text in subs_dict.items():
    idx+=1
    outfile.write(str(idx)+"\n")
    outfile.write(key+" --> "+text[0]+"\n")
    outfile.write(text[1]+"\n\n")
outfile.close()

经过上述代码处理后，您的video.srt文件的输出如下：

最初的回答：

1
00:00:13,100 --> 00:00:27,280
Dr Martin Luther King, Jr, in a 1968 speech where he reflects
upon the Civil Rights Movement, states, "In the end, we will remember not the words of our enemies but the silence of our friends."

2
00:00:27,280 --> 00:00:29,800
As a teacher, I've internalized this message.

3
00:00:29,800 --> 00:00:39,701
Every day, all around us, we see the consequences of silence manifest themselves in the form of discrimination, violence, genocide and war.

4
00:00:39,701 --> 00:00:46,178
In the classroom, I challenge my students to explore the silences in their own lives through poetry.

5
00:00:46,178 --> 00:00:54,740
We work together to fill those spaces, to recognize them, to name them, to understand that they don't
have to be sources of shame.

6
00:00:54,740 --> 00:01:14,408
In an effort to create a culture within my classroom where students feel safe sharing the intimacies of their own silences, I have four core principles posted on the board that sits in the front of my class, which every student signs
at the beginning of the year: read critically, write consciously, speak clearly, tell your truth.

7
00:01:14,408 --> 00:01:18,871
And I find myself thinking a lot about that last point, tell your truth.

8
00:01:18,871 --> 00:01:28,848
And I realized that if I was going to ask my students to speak up, I was going to have to tell my truth and be honest with them about the times where I failed to do so.

9
00:01:28,848 --> 00:01:44,479
So I tell them that growing up, as a kid in a Catholic family in New Orleans, during Lent I was always taught that the most meaningful thing one could do was to give something up, sacrifice something you typically indulge in to prove to God you understand his sanctity.

10
00:01:44,479 --> 00:01:50,183
I've given up soda, McDonald's, French fries, French kisses, and everything in between.

11
00:01:50,183 --> 00:01:54,071
But one year, I gave up speaking.

12
00:01:54,071 --> 00:02:03,286
I figured the most valuable thing I could sacrifice was my own voice, but it was like I hadn't realized that I had given that up a long time ago.

13
00:02:03,286 --> 00:02:23,167
I spent so much of my life telling people the things they wanted to hear instead of the things they needed to, told myself I wasn't meant to be anyone's conscience because I still had to figure out being my own, so sometimes I just wouldn't say anything, appeasing ignorance with my silence, unaware that validation doesn't need words to endorse its existence.

14
00:02:23,167 --> 00:02:29,000
When Christian was beat up for being gay, I put my hands in my pocket and walked with my head
down as if I didn't even notice.

15
00:02:29,000 --> 00:02:39,502
I couldn't use my locker for weeks
because the bolt on the lock reminded me of the one I had put on my lips when the homeless man on the corner looked at me with eyes up merely searching for an affirmation that he was worth seeing.

16
00:02:39,502 --> 00:02:43,170
I was more concerned with
touching the screen on my Apple than actually feeding him one.

17
00:02:43,170 --> 00:02:46,049
When the woman at the fundraising gala said "I'm so proud of you.

18
00:02:46,049 --> 00:02:53,699
It must be so hard teaching
those poor, unintelligent kids," I bit my lip, because apparently
we needed her money more than my students needed their dignity.

19
00:02:53,699 --> 00:03:02,878
We spend so much time listening to the things people are saying that we rarely pay attention to the things they don't.

20
00:03:02,878 --> 00:03:06,139
Silence is the residue of fear.

21
00:03:06,139 --> 00:03:09,615
It is feeling your flaws gut-wrench guillotine your tongue.

22
00:03:09,615 --> 00:03:13,429
It is the air retreating from your chest because it doesn't feel safe in your lungs.

23
00:03:13,429 --> 00:03:15,186
Silence is Rwandan genocide.

24
00:03:15,186 --> 00:03:16,423
 Silence is Katrina.

25
00:03:16,553 --> 00:03:19,661
It is what you hear when there
aren't enough body bags left.

26
00:03:19,661 --> 00:03:22,062
It is the sound after the noose is already tied.

27
00:03:22,062 --> 00:03:22,870
It is charring.

28
00:03:22,870 --> 00:03:23,620
 It is chains.

29
00:03:23,620 --> 00:03:24,543
 It is privilege.

30
00:03:24,543 --> 00:03:25,178
 It is pain.

31
00:03:25,409 --> 00:03:28,897
There is no time to pick your battles when your battles have already picked you.

32
00:03:28,897 --> 00:03:31,960
I will not let silence wrap itself around my indecision.

33
00:03:31,960 --> 00:03:36,287
I will tell Christian that he is a lion, a sanctuary of bravery and brilliance.

34
00:03:36,287 --> 00:03:42,340
I will ask that homeless man what his name is and how his day was, because sometimes all people want to be is human.

35
00:03:42,340 --> 00:03:51,665
I will tell that woman that my students can talk about transcendentalism like their last name was Thoreau, and just because you watched
one episode of "The Wire" doesn't mean you know anything about my kids.

36
00:03:51,665 --> 00:04:03,825
So this year, instead of giving something up, I will live every day as if there were a microphone tucked under my tongue, a stage on the underside of my inhibition.

37
00:04:03,825 --> 00:04:10,207
Because who has to have a soapbox when all you've ever needed is your voice?

38
00:04:10,207 --> 00:04:12,712
Thank you.

39
00:04:12,712 --> 00:00:00,000
(Applause)

- Rolf of Saxony

@henry，我确实有办法！把它们移除掉！只是试图将两种灰色应用为字幕的颜色。这与字幕文本本身无关。那个文件的真正问题在于它没有一个句号。 - Rolf of Saxony

一开始这对我很有效，但现在我遇到了错误的字幕时间问题。输出的字幕时间是全新的，而不仅仅是输入时间的子集。我认为核心问题是这个脚本和pysrt做了很多手动时间处理。比如这行代码s_time = (3600*int(h))+(60*int(m))+int(s)+(int(milli)/1000)。我正在重新编写它，使用datetime.timedelta对象，希望能使时间更准确。 - Chris

@Chris 我相信你会在回答中发布修改后的代码和结果。 - Rolf of Saxony

@RolfofSaxony 我放弃了在这个脚本上的构建，因为 pysrt 用那种奇怪的方式表示时间，而不是使用日期时间。 (虽然这个其他的 Python srt 库可以: https://github.com/cdown/srt)我最终修改了一个 mpv lua 脚本，它可以进行句子转换，并将其调整为通用实用程序，并进行了其他改进。缺点是它是 lua...所以不能作为这个问题的有效答案。但到目前为止，我一直非常满意。这是我的脚本: https://gist.github.com/varenc/1b117487f78836aa6a25c74cae4fbbed - Chris

@Chris 对于发布的问题，我们必须根据所给定的条件进行工作。如果摆脱这些限制，世界就是你的海蛎！很高兴看到你发布了代码，供他人使用或从中获得灵感。 :) - Rolf of Saxony

显示剩余10条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rolf of Saxony · Accepted Answer

也许这不是你想要的，但是为什么不直接从字幕文件中获取时间，而不是计算时间呢？
我做了一个示例。它远非完美，但可能会有所帮助。

原始答案：Original Answer

import re

#Pre-process file to remove blank lines, line numbers and timestamp --> chars
with open('video.srt','r') as f:
    lines = f.readlines()
with open('video.tmp','w') as f:
    for line in lines:
        line = line.strip()
        if line.strip():
            if line.strip().isnumeric():
                continue
            else:
                line = line.replace(' --> ', ' ')
                line = line+" "
                f.write(line)

# Process pre-processed file
with open('video.tmp','r') as f:
    lines = f.readlines()

outfile = open('new_video.srt','w')
idx = 0

# Define the regex options we will need

#regex to look for the time stamps in each sentence using the first and last only
timestamps = re.compile('\d{1,2}(?::\d{2}){1,2}(?:,)\d{3}')

#regex to remove html tags from length calculations
tags = re.compile(r'<.*?>')

#re.split('([^\s\0-9]\.)',a)
# This is to cope with text that contains mathematical, chemical formulae, ip addresses etc
# where "." does not mean full-stop (end of sentence)
# This is used to split on a "." only if it is NOT preceded by space or a number
# this should catch most things but will fail to split the sentence if it genuinely
# ends with a number followed by a full-stop.
end_of_sentence = re.compile(r'([^\s\0-9]\.)')

#sentences = str(lines).split('.')
sentences = re.split(end_of_sentence,str(lines))

# Because the sentences where split on "x." we now have to add that back
# so we concatenate every other list item with the previous one.
idx = 0
joined =[]
while idx < (len(sentences) -1) :
    joined.append(sentences[idx]+sentences[idx+1])
    idx += 2
sentences = joined

previous_timings =["00:00:00,000","00:00:00,000"]
previous_sentence = ""

#Dictionary of timestamps that will require post-processing
registry = {}

loop = 0
for sentence in sentences:
    print(sentence)
    timings = timestamps.findall(sentence)
    idx+=1
    outfile.write(str(idx)+"\n")
    if timings:
        #There are timestamps in the sentence
        previous_timings = timings
        loop = 0
        start_time = timings[0]
        end_time = timings[-1]
        # Revert list item to a string
        sentence = ''.join(sentence)
        # Remove timestamps from the text
        sentence = ''.join(re.sub(timestamps,' ', sentence))
        # Get rid of multiple spaces and \ characters
        sentence = '  '.join(sentence.split())
        sentence = sentence.replace('  ', ' ')
        sentence = sentence.replace("\\'", "'")
        previous_sentence = sentence
        print("Starts at", start_time)
        print(sentence)
        print("Ends at", end_time,'\n')
        outfile.write(start_time+" --> "+end_time+"\n")
        outfile.write(sentence+"\n\n")

    else:
        # There are no timestamps in the sentence therefore this must
        # be a separate sentence cut adrift from an existing timestamp
        # We will have to estimate its start and end times using data
        # from the last time stamp we know of
        start_time = previous_timings[0]
        reg_end_time = previous_timings[-1]

        # Convert timestamp to  seconds
        h,m,s,milli = re.split(':|,',start_time)
        s_time = (3600*int(h))+(60*int(m))+int(s)+(int(milli)/1000)

        # Guess the timing for the previous sentence and add it
        # but only for the first adrift sentence as the start time will be adjusted
        # This number may well vary depending on the cadence of the speaker
        if loop == 0:
            registry[reg_end_time] = reg_end_time
            #s_time += 0.06 * len(previous_sentence)
            s_time += 0.06 * len(tags.sub('',previous_sentence))
        # Guess the end time
        e_time = s_time + (0.06 * len(tags.sub('',previous_sentence)))

        # Convert start to a timestamp
        s,milli = divmod(s_time,1)
        m,s = divmod(int(s),60)
        h,m = divmod(m,60)
        start_time = "{:02d}:{:02d}:{:02d},{:03d}".format(h,m,s,round(milli*1000))

        # Convert end to a timestamp
        s,milli = divmod(e_time,1)
        m,s = divmod(int(s),60)
        h,m = divmod(m,60)
        end_time = "{:02d}:{:02d}:{:02d},{:03d}".format(h,m,s,round(milli*1000))

        #Register new end time for previous sentence
        if loop == 0:
            loop = 1
            registry[reg_end_time] = start_time

        print("Starts at", start_time)
        print(sentence)
        print("Ends at", end_time,'\n')
        outfile.write(start_time+" --> "+end_time+"\n")
        outfile.write(sentence+"\n\n")
        try:
            # re-set the previous start time in case the following sentence
            # was cut adrift from its time stamp as well
            previous_timings[0] = end_time
        except:
            pass
outfile.close()

#Post processing
if registry:
    outfile = open('new_video.srt','r')
    text = outfile.read()
    new_text = text
    # Run through registered end times and replace them
    # if not the video player will not display the subtitles
    # correctly because they overlap in time
    for key, end in registry.items():
        new_text = new_text.replace(key, end, 1)
        print("replacing", key, "with", end)
    outfile.close()
    outfile = open('new_video.srt','w')
    outfile.write(new_text)
    outfile.close()

编辑： 我很高兴地坚持了这个代码，因为我对这个问题感到好奇。
虽然我知道这很hackey，并且没有使用pysrt字幕模块，只用了re，但在这种情况下，我认为它做得还不错。
我已经对编辑过的代码进行了注释，所以希望可以清楚地了解我的做法和原因。
regx正在查找时间戳模式0:00:0,000、00:00:00,000、0:00:00,000等，即

\d{1,2}(?::\d{2}){1,2}(?:,)\d{3}

1或2个小数点后跟着:再加上2个小数点，后跟着1或2个小数点，再加上:，最后是3个小数点

如果一个连接的句子中有多个起始和结束时间，则我们仅需要整个句子的第一个时间，即句子的开始时间，和最后一个时间，即句子的结束时间。我希望这很清楚。

编辑2 此版本应对了数学和化学公式中的句号、ip地址等情况。基本上是指句号并不意味着句号。

如何将字幕文件转换为每个字幕仅包含一句话？

问题：

测试简单示例

更多可供尝试的视频

编辑3