从字符串中仅移除最后一次出现的单词。

Question

从字符串中仅移除最后一次出现的单词。

3

我有一个字符串和一个短语数组。

input_string = 'alice is a character from a fairy tale that lived in a wonder land. A character about whome no one knows much about'

phrases_to_remove = ['wonderland', 'character', 'no one']

现在我想做的是，从input_string中删除数组phrases_to_remove中单词的最后一次出现。

output_string = 'alice is a character from a fairy tale that lived in a. A about whome knows much about'

我已经写了一个函数，它接受一个输入字符串以及要替换的 数组 或仅仅是一个字符串。我使用了 rsplit() 方法来进行短语替换。

def remove_words_from_end(actual_string: str, to_replace, occurrence: int):
    if isinstance(to_replace, list):
        output_string = actual_string
        for string in to_replace:
            output_string = ' '.join(output_string.rsplit(string, maxsplit=occurrence))
        return output_string.strip()
    elif isinstance(to_replace, str):
        return ' '.join(actual_string.rsplit(to_replace, maxsplit=occurrence)).strip()
    else:
        raise TypeError('the value "to_replace" must be a string or a list of strings')

代码的问题在于，我无法删除具有空格不匹配的单词。例如wonder land和wonderland。

有没有一种方法可以在不牺牲太多性能的情况下解决这个问题？

- iam.Carrot

7

如果你希望代码在被要求删除 wonderland 时同时将 wonder land 删除，那么当被要求删除 nowhere 时，你是否也希望将 now here 删除？如何区分“空格不匹配”和合法的空格？ - John Coleman

@JohnColeman 是的，基本上白空格不应该成为导致字符串未被删除的因素。是的，就是这样。如何忽略空格从输入数组中删除单词？ - iam.Carrot

而且re不能使用re.sub(phrases_to_remove[0], '' , input_string)。 - jackotonye

@jackotonye re 如何处理 空格？ - iam.Carrot

请查看 https://dev59.com/npvga4cB1Zd3GeqP9O37 以获取最后出现的内容。 - jackotonye

显示剩余5条评论

2个回答

0

通常比较两个字符串 s1 和 s2 时，你可以检查它们是否相等（大小相同且每个字符都相同 - 使用的标准方法）或者（你需要实现的部分）它们在大小上不同并且它们不同的是一个空格。下面是一个执行此操作的示例函数。就性能而言，这是一个 O(n) 检查，其中 n 是字符串的长度，但无论如何，初始检查也是 O(n)。

def almost_match(s1, s2):
  # If they have a single space of difference
  if len(s1) != len(s2) + 1 and len(s2) != len(s1) + 1:
    return False
  i = 0 # counter for s1 characters
  j = 0 # counter for s2 characters

  while i < len(s1) and j < len(s2):
    if s1[i] != s2[j]:
      if s1 == ' ':
        i = i + 1
        continue
      elif s2 == ' ':
        j = j + 1
        continue
      else:
        return False
    i = i + 1
    j = j + 1

  if j < len(s2) and s2[j] == ' ':
    j = j + 1

  if i < len(s1) and s2[i] == ' ':
    i = i + 1

  return i == len(s1) and j == len(s2) # require that both strings matched fully

对于最后一行，请注意它防止将“abc”与“abcd”匹配。

这可以进行优化，但这是一般的想法。

- Pani

不行。如果单词之间有两个空格，例如 wond er land 表示 wonderland，那么这种方法行不通。我不能硬编码空格的数量。 - iam.Carrot

如果您删除对1个空格的检查，那么它将起作用。我只是添加了它，因为我以为您指的是1个空格。如果有所不同，代码会忽略空格。 - Pani

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Zoltan DeWitt · Accepted Answer

使用 re 处理可能的空白是一种方法：

import re

def remove_last(word, string):
    pattern = ' ?'.join(list(word))
    matches = list(re.finditer(pattern, string))
    if not matches:
        return string
    last_m = matches[-1]
    sub_string = string[:last_m.start()]
    if last_m.end() < len(string):
        sub_string += string[last_m.end():]
    return sub_string

def remove_words_from_end(words, string):
    words_whole = [word.replace(' ', '') for word in words]
    string_out = string
    for word in words:
        string_out = remove_last(word, string_out)
    return string_out

并且运行一些测试：

>>> input_string = 'alice is a character from a fairy tale that lived in a wonder land. A character about whome no one knows much about'
>>> phrases_to_remove = ['wonderland', 'character', 'no one']
>>> remove_words_from_end(phrases_to_remove, input_string)
'alice is a character from a fairy tale that lived in a . A  about whome  knows much about'
>>> phrases_to_remove = ['wonder land', 'character', 'noone']
>>> remove_words_from_end(phrases_to_remove, input_string)
'alice is a character from a fairy tale that lived in a . A  about whome  knows much about'

在这个例子中，正则表达式搜索模式只是单词，并且每个字符之间可能有一个空格' ?'。