仅删除字符串末尾的子字符串

Question

仅删除字符串末尾的子字符串

83

我有一堆字符串，其中一些字符串末尾有' rec'。我只想在这些字符串的末尾4个字符是' rec'时删除它。

换句话说，我有：

somestring = 'this is some string rec'

而我希望它变成这样

somestring = 'this is some string'

Python中应该如何处理这个问题？

- Alex Gordon

11个回答

31

从Python 3.9开始，您可以使用removesuffix：

'this is some string rec'.removesuffix(' rec')
# 'this is some string'

- Xavier Guihot

3

补充一下，这是由PEP616引入的（使用str.removeprefix）。 - Conchylicultor

30

既然您必须获得len(trailing)（其中trailing是您想要删除的字符串，如果它在尾部），我建议在这种情况下避免.endswith会导致的轻微重复工作。当然，代码的证明在于时间，所以让我们进行一些测量（将函数命名为提出建议的回答者的名称）：

import re

astring = 'this is some string rec'
trailing = ' rec'

def andrew(astring=astring, trailing=trailing):
    regex = r'(.*)%s$' % re.escape(trailing)
    return re.sub(regex, r'\1', astring)

def jack0(astring=astring, trailing=trailing):
    if astring.endswith(trailing):
        return astring[:-len(trailing)]
    return astring

def jack1(astring=astring, trailing=trailing):
    regex = r'%s$' % re.escape(trailing)
    return re.sub(regex, '', astring)

def alex(astring=astring, trailing=trailing):
    thelen = len(trailing)
    if astring[-thelen:] == trailing:
        return astring[:-thelen]
    return astring

假设我们已经将这个Python文件命名为a.py并且它位于当前目录中; 现在，...

$ python2.6 -mtimeit -s'import a' 'a.andrew()'
100000 loops, best of 3: 19 usec per loop
$ python2.6 -mtimeit -s'import a' 'a.jack0()'
1000000 loops, best of 3: 0.564 usec per loop
$ python2.6 -mtimeit -s'import a' 'a.jack1()'
100000 loops, best of 3: 9.83 usec per loop
$ python2.6 -mtimeit -s'import a' 'a.alex()'
1000000 loops, best of 3: 0.479 usec per loop

正如你所看到的，基于正则表达式的解决方案通常会“过度设计”问题而变得“远不如其他方法”，这也可能是为什么在Python社区中RE拥有如此糟糕声誉的原因之一。尽管如此，@ Jack评论中的建议比@ Andrew最初提出的要好得多。预计基于字符串的解决方案将显示出更好的性能，我的避免使用endswith的实现相对于@ Jack的实现略微快15％。因此，两个基于字符串的方法都很好（而且都简洁明了），我只是更喜欢我的变体，因为我本质上是一个节俭的人（有些人可能会说，小气），“不浪费，不虚度”！

- Alex Martelli

你在 import a' 'a.xxx 中为什么要加一个空格？ - Blankman

@Blankman，这是一个运行Python的bash命令：设置（-s）是一个参数，被计时的代码是另一个参数。每个参数都用引号括起来，这样我就不必担心它包含空格和/或特殊字符了。在bash中（以及大多数其他shell，包括Windows自带的cmd.exe），您总是使用空格分隔参数，因此我对您的问题感到非常惊讶！将参数引用为shell命令以保留每个参数中的空格和特殊字符绝对不是我所谓的奇怪、罕见或高级用法...！-) - Alex Martelli

哦，我看到你已经绕过了endswith，就像我在杰克的答案中提到的那样。缓存长度也可以避免Python（和C！）可怕的调用开销。 - Matt Joiner

1

我想知道如果正则表达式只编译一次并重复使用，会有怎样的性能表现。 - Conchylicultor

27

如果速度不是很重要，可以使用正则表达式：

import re

somestring='this is some string rec'

somestring = re.sub(' rec$', '', somestring)

- Per Mejdal Rasmussen

10

这是杰克·凯利答案的一行版本，同时附上它的兄弟姐妹。

def rchop(s, sub):
    return s[:-len(sub)] if sub && s.endswith(sub) else s

def lchop(s, sub):
    return s[len(sub):] if s.startswith(sub) else s

- cdiggins

如果 sub 是一个空字符串，它将无法工作。你忘记检查了。 - Iceflower S

请查看：https://peps.python.org/pep-0616/#specification，了解类似代码。因此，最好使用`removesuffix()`和`removeprefix()`以确保兼容性，参考Jack Kelly的回答。这些版本的API与传递字符串对象不同。 - artless noise

参见：https://peps.python.org/pep-0616/#specification，以获取类似的代码。因此，最好使用`removesuffix()`和`removeprefix()`来保持兼容性，这与Jack Kelly的答案相同。这些版本的API在调用'object'时与传递字符串不同。 - undefined

4

您也可以使用正则表达式：

from re import sub

str = r"this is some string rec"
regex = r"(.*)\srec$"
print sub(regex, r"\1", str)

- Andrew Hare

10

捕获组在这里可能有些过度了。sub(' rec$', '', str)就可以了。 - Jack Kelly

0

受到@David Foster的回答的启发，我会这样做。

def _remove_suffix(text, suffix):
    if text is not None and suffix is not None:
        return text[:-len(suffix)] if text.endswith(suffix) else text
    else:
        return text

参考: Python字符串切片

- y2k-shubham

0

使用 more_itertools，我们可以对满足谓词的字符串进行 rstrip 操作。

安装

> pip install more_itertools

代码

import more_itertools as mit


iterable = "this is some string rec".split()
" ".join(mit.rstrip(iterable, pred=lambda x: x in {"rec", " "}))
# 'this is some string'

" ".join(mit.rstrip(iterable, pred=lambda x: x in {"rec", " "}))
# 'this is some string'

在这里我们传递所有需要从结尾删除的尾随项目。

有关详细信息，请参见 more_itertools 文档。

- pylang

0

作为一种单行代码生成器加入：

test = """somestring='this is some string rec'
this is some string in the end word rec
This has not the word."""
match = 'rec'
print('\n'.join((line[:-len(match)] if line.endswith(match) else line)
      for line in test.splitlines()))
""" Output:
somestring='this is some string rec'
this is some string in the end word 
This has not the word.
"""

- Tony Veijalainen

0


def remove_trailing_string(content, trailing):
    """
    Strip trailing component `trailing` from `content` if it exists.
    """
    if content.endswith(trailing) and content != trailing:
        return content[:-len(trailing)]
    return content

- Ehsan Ahmadi

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jack Kelly · Accepted Answer

def rchop(s, suffix):
    if suffix and s.endswith(suffix):
        return s[:-len(suffix)]
    return s

somestring = 'this is some string rec'
rchop(somestring, ' rec')  # returns 'this is some string'