我有一个文本看起来像这样:
1
00:00:01,860 --> 00:00:31,210
Affil of fifth at fat at all the social ball and said, with all this little in the
2
00:00:31,210 --> 00:01:03,060
mid limited and will cost a lot, for want of a lot of it is I never do this or below are the innocent of fat in the annual own none will bit less often were a little the earth the oven for the area of some of them some of the atom in the long will recall the law, will cost you the ball a little less of Odessa and coal rule the Vikings in at a loss
3
00:01:03,980 --> 00:01:33,150
of our lady of one of the will of the wall routing visiting little sign of the limited use of a lot of wind up with a loss of 14 and uncivil will find a site to lop off call them into solid, a London, can we stop go to work as a gay sailor kissing a lot of that scene of the law that on them in this case
4
00:01:33,950 --> 00:02:03,190
will almost a kind wilkinson's, and that a settlement, or the fog collared of the unknown, some would call and all of this was a little, some of us up a lot of letters, union would quit them or not will be or will lend money to zoning and will open the door to that of the novel opens in
5
00:02:04,240 --> 00:02:24,180
it and solidity can cut later with boats can die to only see not open only to six and 0:50 and world go back a at the fat of that at that
我希望能从文本中仅提取出句子,例如:“在社交圈中排名第五,在所有的脂肪球中都是这样说的,随着所有这些小事情的发生,限制也将很大,并且会花费很多,因为想要......”
因此,原始文本如下:
"1\r\n00:00:01,860 --> 00:00:31,210\r\nAffil of fifth at fat at all the social ball and said, with all this little in the\r\n\r\n2\r\n00:00:31,210 --> 00:01:03,060\r\nmid limited and will cost a lot, for want of a lot of it is I never do this or below are the innocent of fat in the annual own none will bit less often were a little the earth the oven for the area of some of them some of the atom in the long will recall the law, will cost you the ball a little less of Odessa and coal rule the Vikings in at a loss\r\n\r\n3\r\n00:01:03,980 --> 00:01:33,150\r\nof our lady of one of the will of the wall routing visiting little sign of the limited use of a lot of wind up with a loss of 14 and uncivil will find a site to lop off call them into solid, a London, can we stop go to work as a gay sailor kissing a lot of that scene of the law that on them in this case\r\n\r\n4\r\n00:01:33,950 --> 00:02:03,190\r\nwill almost a kind wilkinson's, and that a settlement, or the fog collared of the unknown, some would call and all of this was a little, some of us up a lot of letters, union would quit them or not will be or will lend money to zoning and will open the door to that of the novel opens in\r\n\r\n5\r\n00:02:04,240 --> 00:02:24,180\r\nit and solidity can cut later with boats can die to only see not open only to six and 0:50 and world go back a at the fat of that at that\r\n\r\n"
通过检查原始文本,我们可以根据“\r\n”这样的分隔符将文本分开,但我不知道如何编写正则表达式。
text.split('\n').strip()
? - TigerhawkT3text.splitlines()[2::4]
看起来更像是这样。 - TigerhawkT3