Python：使用正则表达式从所有行中删除空格

Question

Python：使用正则表达式从所有行中删除空格

pythonregex

29

^(\s+) 只移除第一行的空白。如何移除所有行前面的空白?

- user469652

空格包括换行符，这意味着如果在多行字符串上使用它，所有内容都将出现在一行上。请向我们展示一些输入，以便我们能够帮助理解问题！ - rdrey

@rdrey：实际上，在多行模式下，^匹配每个换行符之后的位置，所以这不会成为问题（除了“\n\n”）。请参见我的答案。 - AndiDog

感谢您的更正，每天都在学习新东西:D - rdrey

6个回答

12

如果你想要删除字符串的前后空格，你可以尝试使用strip()函数，或者如果只想删除前面的空格，可以使用lstrip()函数。

>>> s="  string with front spaces and back   "
>>> s.strip()
'string with front spaces and back'
>>> s.lstrip()
'string with front spaces and back   '

for line in open("file"):
    print line.lstrip()

如果您真的想使用正则表达式

>>> import re
>>> re.sub("^\s+","",s) # remove the front
'string with front spaces and back   '
>>> re.sub("\s+\Z","",s)
'  string with front spaces and back'  #remove the back

- ghostdog74

8

@AndiDog在他目前被接受的答案中承认它吞噬连续的换行符。

以下是如何解决这个问题的方法，这是由于\n既是空格又是换行符所致。我们需要做的是制作一个只包括除换行符以外的空格字符的re类。

我们想要的是空格而不是换行符，这在re类中无法直接表达。让我们将其重写为not not (whitespace and not newline)即not(not whitespace or not not newline（感谢Augustus）即not(not whitespace or newline)即[^\S\n]在re符号中。

所以：

>>> re.sub(r"(?m)^[^\S\n]+", "", "  a\n\n   \n\n b\n c\nd  e")
'a\n\n\n\nb\nc\nd  e'

- John Machin

1

nowhite = ''.join(mytext.split())

没有空格会像您要求的那样保留（所有内容都被放在一个单词中）。通常更有用的是使用' '或'\n'连接所有内容以保持单词分开。

- Tony Veijalainen

1

你需要使用 re.MULTILINE 选项：

re.sub("(?m)^\s+", "", text)

"(?m)"部分启用了多行模式。

- tzot

0

大多数情况下，您实际上不需要正则表达式。如果您只想要删除多行中的常见缩进，请尝试使用textwrap模块：

>>> import textwrap
>>> messy_text = " grrr\n whitespace\n everywhere"
>>> print textwrap.dedent(messy_text)
grrr
whitespace
everywhere

请注意，如果缩进不规则，它将保持不变：

>>> very_messy_text = " grrr\n \twhitespace\n everywhere"
>>> print textwrap.dedent(very_messy_text)
grrr
        whitespace
everywhere

- Tim McNamara

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- AndiDog · Accepted Answer

Python的正则表达式模块默认不启用多行匹配，因此需要显式指定该标志。

r = re.compile(r"^\s+", re.MULTILINE)
r.sub("", "a\n b\n c") # "a\nb\nc"

# or without compiling (only possible for Python 2.7+ because the flags option
# didn't exist in earlier versions of re.sub)

re.sub(r"^\s+", "", "a\n b\n c", flags = re.MULTILINE)

# but mind that \s includes newlines:
r.sub("", "a\n\n\n\n b\n c") # "a\nb\nc"

也可以把标志内联到模式中：

re.sub(r"(?m)^\s+", "", "a\n b\n c")

一个更简单的解决方案是避免使用正则表达式，因为原始问题非常简单：

content = 'a\n b\n\n c'
stripped_content = ''.join(line.lstrip(' \t') for line in content.splitlines(True))
# stripped_content == 'a\nb\n\nc'