字符串分割问题

Question

字符串分割问题

6

问题：将一个字符串按照传入的分隔符列表拆分为单词列表。

字符串："After the flood ... all the colors came out." 期望输出：['After'，'the'，'flood'，'all'，'the'，'colors'，'came'，'out'] 我编写了以下函数——请注意，我知道使用一些Python内置函数可以更好地拆分字符串，但是为了学习，我打算采用这种方式进行：

def split_string(source,splitlist):
    result = []
    for e in source:
           if e in splitlist:
                end = source.find(e)
                result.append(source[0:end])
                tmp = source[end+1:]
                for f in tmp:
                    if f not in splitlist:
                        start = tmp.find(f)
                        break
                source = tmp[start:]
    return result

out = split_string("After  the flood   ...  all the colors came out.", " .")

print out

['After', 'the', 'flood', 'all', 'the', 'colors', 'came out', '', '', '', '', '', '', '', '', '']

我无法理解为什么“came out”没有被分割成两个单独的单词“came”和“out”。就好像这两个单词之间的空格被忽略了一样。我认为输出结果的其余部分是由于“came out”问题所导致的垃圾数据。编辑：我按照@Ivc的建议编写了以下代码:

def split_string(source,splitlist):
    result = []
    lasti = -1
    for i, e in enumerate(source):
        if e in splitlist:
            tmp = source[lasti+1:i]
            if tmp not in splitlist:
                result.append(tmp)
            lasti = i
        if e not in splitlist and i == len(source) - 1:
            tmp = source[lasti+1:i+1]
            result.append(tmp)
    return result

out = split_string("This is a test-of the,string separation-code!"," ,!-")
print out
#>>> ['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code']

out = split_string("After  the flood   ...  all the colors came out.", " .")
print out
#>>> ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
print out
#>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code']

out = split_string(" After  the flood   ...  all the colors came out...............", " ."
print out
#>>>['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

- codingknob

6个回答

2

我认为如果您使用正则表达式，只要输入的字符串中包含想要的单词，就可以轻松地获取它们。

>>> import re
>>> string="After the flood ... all the colors came out."
>>> re.findall('\w+',string)
['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

- Ju chi chan

2

您似乎期望：

source = tmp[start:]

修改外部for循环正在迭代的source源。它不会 - 该循环将继续遍历您给出的字符串，而不是现在使用该名称的任何对象。这可能意味着您所处的字符可能不在剩余的source中。

相反，可以通过以下方式跟踪字符串中的当前索引：

for i, e in enumerate(source):
   ...

你要添加的内容始终是source[lasti+1:i]，你只需要跟踪lasti即可。

- lvc

1

谢谢大家提供的出色解决方案。我选择了这个答案，因为它迫使我学习逻辑而不是使用预先构建的函数。显然，如果我要编写商业代码，我不会重新发明轮子，但出于学习目的，我会选择这个答案。谢谢大家的帮助。 - codingknob

0

为什么要做太多的事情，只需要这么简单，试试看..
str.split(strSplitter , intMaxSplitCount) intMaxSplitCount是可选的
在你的情况下，如果你想避免...，你还需要做一些清理工作其中一个方法是替换它，比如str.replace(".","", 3) 3是可选的，它只会替换前3个点

所以简而言之，你需要做以下几步：
print ((str.replace(".", "",3)).split(" ")) 它将打印出你想要的结果

我已经执行了，请在这里检查,...

- Gaurav Gandhi

0

[x for x in a.replace('.', '').split(' ') if len(x)>0]

这里的'a'是你的输入字符串。

- thavan

0

一种更简单的方式，至少看起来更简单。

import string

    def split_string(source, splitlist):
        table = string.maketrans(splitlist,  ' ' * len(splitlist))
        return string.translate(source, table).split()

你可以查看 string.maketrans 和 string.translate

- xvatar

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Kiet Tran · Accepted Answer

您不需要内部循环调用。仅仅这样就足够了：

def split_string(source,splitlist):
    result = []
    for e in source:
           if e in splitlist:
                end = source.find(e)
                result.append(source[0:end])
                source = source[end+1:]
    return result

在将source添加到列表之前，您可以通过检查source [:end]是否为空字符串来消除“垃圾”（即空字符串）。