从行的开头和结尾或仅从结尾删除字符

Question

从行的开头和结尾或仅从结尾删除字符

8

我想使用正则表达式从字符串中删除一些符号，例如：

==（在行的开头和结尾都出现），

*（仅在行的开头出现）。

def some_func():
    clean = re.sub(r'= {2,}', '', clean) #Removes 2 or more occurrences of = at the beg and at the end of a line.
    clean = re.sub(r'^\* {1,}', '', clean) #Removes 1 or more occurrences of * at the beginning of a line.

我的代码出了问题，看起来表达式有误。如果一个字符/符号位于行首或行尾（具有一个或多个出现），我该如何去除它？

- Gusto

5个回答

3

您的正则表达式中有多余的空格。即使是一个空格也被视为一个字符。

r'^(?:\*|==)|==$'

- Ignacio Vazquez-Abrams

0

首先，您应该注意“{”之前的空格...这些是有意义的，因此您示例中的量词适用于空格。

要仅在开头或结尾删除“=”（两个或更多），您需要不同的正则表达式...例如

clean = re.sub(r'^(==+)?(.*?)(==+)?$', r'\2', s)

如果您没有放置 "^" 或 "$"，则表达式可以匹配任何位置（即甚至在字符串的中间）。

- 6502

0

不是替换而是保留 ? :

tu = ('======constellation==' , '==constant=====' ,
      '=flower===' , '===bingo=' ,
      '***seashore***' , '*winter*' ,
      '====***conditions=**' , '=***trees====***' , 
      '***=information***=' , '*=informative***==' )

import re
RE = '((===*)|\**)?(([^=]|=(?!=+\Z))+)'
pat = re.compile(RE)

for ch in tu:
    print ch,'  ',pat.match(ch).group(3)

结果：

======constellation==    constellation
==constant=====    constant
=flower===    =flower
===bingo=    bingo=
***seashore***    seashore***
*winter*    winter*
====***conditions=**    ***conditions=**
=***trees====***    =***trees====***
***=information***=    =information***=
*=informative***==    =informative***

你实际上想要什么？

====***条件=** 给出条件=**？

***====百====*** 给出百====***？

为了开始？**

- eyquem

结果正是我想要的，但我想将它写入一个文件（使用 UTF-8 编码）而不是打印输出。你有什么建议吗？ - Gusto

0

我认为以下代码可以完成任务：

tu = ('======constellation==' , '==constant=====' ,
      '=flower===' , '===bingo=' ,
      '***seashore***' , '*winter*' ,
      '====***conditions=**' , '=***trees====***' , 
      '***=information***=' , '*=informative***==' )

import re,codecs

with codecs.open('testu.txt', encoding='utf-8', mode='w') as f:
    pat = re.compile('(?:==+|\*+)?(.*?)(?:==+)?\Z')
    xam = max(map(len,tu)) + 3
    res = '\n'.join(ch.ljust(xam) + pat.match(ch).group(1)
                    for ch in tu)
    f.write(res)
    print res

我在之前的帖子中写正则表达式时，脑子到哪里去了？！非贪婪量词 .*? 在 ==+\Z 之前才是真正的解决方案。

- eyquem

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mdeous · Accepted Answer

如果你只想从开头和结尾删除字符，你可以使用 string.strip() 方法。这将提供类似以下代码：

>>> s1 = '== foo bar =='
>>> s1.strip('=')
' foo bar '
>>> s2 = '* foo bar'
>>> s2.lstrip('*')
' foo bar'

strip方法会从字符串的开头和结尾删除给定参数中包含的字符，lstrip仅会删除开头的字符，rstrip仅会删除结尾的字符。

如果您真的想使用正则表达式，它们将看起来像这样：

clean = re.sub(r'(^={2,})|(={2,}$)', '', clean)
clean = re.sub(r'^\*+', '', clean)

但在我看来，使用strip/lstrip/rstrip是你想要做的事情中最合适的。

编辑：根据Nick的建议，以下是一行代码可以完成所有这些操作：

clean = clean.lstrip('*').strip('= ')

常见的错误是认为这些方法按照它们在参数中给定的顺序删除字符，实际上，参数只是要删除的字符序列，无论其顺序如何，这就是为什么.strip('= ')会从开头和结尾删除所有的'='和空格，而不仅仅是字符串'= '。