如何将重复的字符串拆分为列表

Question

如何将重复的字符串拆分为列表

3

我有一些字符串，每个字符串都是某个字符串的一个或多个副本。例如：

L = "hellohellohello"
M = "good"
N = "wherewhere"
O = "antant"

我想将这样的字符串拆分为列表，使得每个元素只包含重复的部分。例如：

splitstring(L) ---> ["hello", "hello", "hello"]
splitstring(M) ---> ["good"]
splitstring(N) ---> ["where", "where"]
splitstring(O) ---> ["ant", "ant"]

由于每个字符串大约有1000个字符长，如果能够相对快速地完成这个任务就太好了。

请注意，在我的情况下，所有的重复都从字符串的开头开始，并且它们之间没有间隙，所以比起在一个字符串中找到最大重复的一般问题来说，这要简单得多。

那么，如何做到这一点呢？

- Simd

看一下这个问题，我认为你在寻找类似的东西？此外，该方法的复杂度为O(n)，因此根据您的要求应该非常快。 - Mridul Kashyap

@MridulKashyap 我的问题要简单得多，因为我的重复从字符串开头开始，它们之间没有任何间隙。 - Simd

5个回答

1

尝试这个。它不是削减列表，而是专注于找到最短的模式，然后通过适当的次数重复此模式来创建新列表。

def splitstring(s):
    # searching the number of characters to split on
    proposed_pattern = s[0]
    for i, c in enumerate(s[1:], 1):
        if proposed_pattern == s[i:(i+len(proposed_pattern))]:
            # found it
            break
        else:
            proposed_pattern += c
    else:
        print 'found no pattern'
        exit(1)
    # generating the list
    n = len(proposed_pattern)
    return [proposed_pattern]*(len(s)//n)


if __name__ == '__main__':
    L = 'hellohellohellohello'
    print splitstring(L)  # prints ['hello', 'hello', 'hello', 'hello']

- AdrienW

我之前不知道这三件事，谢谢您。我会测试并进行编辑。 - AdrienW

0

我会采用的方法：

import re

L = "hellohellohello"
N = "good"
N = "wherewhere"

cnt = 0
result = ''
for i in range(1,len(L)+1):
    if cnt <= len(re.findall(L[0:i],L)):
        cnt = len(re.findall(L[0:i],L))
        result = re.findall(L[0:i],L)[0]

print(result)

使用相应变量，输出以下内容：

hello
good
where

- Gábor Erdős

0

假设重复单词的长度大于1，这个就可以工作：

a = "hellohellohello"

def splitstring(string):
    for number in range(1, len(string)):
        if string[:number] == string[number:number+number]:
            return string[:number]
    #in case there is no repetition
    return string

splitstring(a)

- pawelty

2

它在"aabaab"上失败了。 - cogitovita

0

#_*_ coding:utf-8 _*_
import re
'''
refer to the code of Gábor Erds below
'''

N = "wherewhere"
cnt = 0
result = ''
countN = 0
showresult = []

for i in range(1,len(N)+1):
    if cnt <= len(re.findall(N[0:i],N)):
        cnt = len(re.findall(N[0:i],N))
        result = re.findall(N[0:i],N)[0]
        countN = len(N)/len(result)
for i in range(0,countN):
    showresult.append(result)
print showresult

- Fan.Dog

请解释一下你的代码与Gábor Erdős的帖子有何不同。 - Nander Speerstra

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Aran-Fey · Accepted Answer

使用正则表达式查找重复的单词，然后简单地创建一个长度适当的列表：

def splitstring(string):
    match= re.match(r'(.*?)(?:\1)*$', string)
    word= match.group(1)
    return [word] * (len(string)//len(word))