如何去除字符串中的空格？

Question

如何去除字符串中的空格？

pythonstringwhitespacetrimstrip

1213

有没有一种Python函数可以从字符串中删除空白(空格和制表符)？

这样，给定输入" \t example string\t "就会变成"example string"。

- Chris

1

谢谢你提醒我。我之前已经发现了strip函数，但它似乎对我的输入无效。 - Chris

6

Python认为的空格字符存储在string.whitespace中。 - John Fouhy

2

你所说的“strip function”是指strip方法吗？“它似乎对我的输入不起作用。”请提供您的代码、输入和输出。 - S.Lott

1

对于所有的东西？忽略大小写怎么样？这是一个不幸的情况，在几乎所有其他语言中都要容易得多。 - demongolem

5

可能是Python中截取字符串的重复问题。 - Breno Baiardi

显示剩余2条评论

15个回答

82

在Python中，修剪方法被称为strip：

str.strip()  # trim
str.lstrip()  # left trim
str.rstrip()  # right trim

- gcb

5

这个词容易记住，因为strip看起来几乎和trim一样。 - isar

25

对于前导和尾随空格：

s = '   foo    \t   '
print s.strip() # prints "foo"

否则，可以使用正则表达式进行匹配：

import re
pat = re.compile(r'\s+')
s = '  \t  foo   \t   bar \t  '
print pat.sub('', s) # prints "foobar"

- ars

1

你没有编译你的正则表达式。你需要将它编译为 pat = re.compile(r'\s+')。 - Evan Fosmark

通常情况下，您想要使用 sub(" ", s) 而不是 ""，后者将合并单词，您将无法再使用 .split(" ") 进行标记化。 - user3467349

很高兴能够看到print语句的输出。 - Ron Klein

24

你也可以使用非常简单和基础的函数：str.replace()，它可以处理空格和制表符：

>>> whitespaces = "   abcd ef gh ijkl       "
>>> tabs = "        abcde       fgh        ijkl"

>>> print whitespaces.replace(" ", "")
abcdefghijkl
>>> print tabs.replace(" ", "")
abcdefghijkl

简单易行。

- Lucas

2

但是，不幸的是，这也会去除内部空间，而原始问题中的示例则保留了内部空间。 - Brandon Rhodes

12

#how to trim a multi line string or a file

s=""" line one
\tline two\t
line three """

#line1 starts with a space, #2 starts and ends with a tab, #3 ends with a space.

s1=s.splitlines()
print s1
[' line one', '\tline two\t', 'line three ']

print [i.strip() for i in s1]
['line one', 'line two', 'line three']




#more details:

#we could also have used a forloop from the begining:
for line in s.splitlines():
    line=line.strip()
    process(line)

#we could also be reading a file line by line.. e.g. my_file=open(filename), or with open(filename) as myfile:
for line in my_file:
    line=line.strip()
    process(line)

#moot point: note splitlines() removed the newline characters, we can keep them by passing True:
#although split() will then remove them anyway..
s2=s.splitlines(True)
print s2
[' line one\n', '\tline two\t\n', 'line three ']

- Rusty Rob

4

目前还没有人发布这些正则表达式的解决方案。

匹配：

>>> import re
>>> p=re.compile('\\s*(.*\\S)?\\s*')

>>> m=p.match('  \t blah ')
>>> m.group(1)
'blah'

>>> m=p.match('  \tbl ah  \t ')
>>> m.group(1)
'bl ah'

>>> m=p.match('  \t  ')
>>> print m.group(1)
None

搜索（您需要以不同的方式处理“仅空格”输入情况）：

>>> p1=re.compile('\\S.*\\S')

>>> m=p1.search('  \tblah  \t ')
>>> m.group()
'blah'

>>> m=p1.search('  \tbl ah  \t ')
>>> m.group()
'bl ah'

>>> m=p1.search('  \t  ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

如果您使用re.sub，可能会删除内部空格，这可能是不期望的。

- user1149913

4

空白字符包括空格、制表符和回车换行符。因此，我们可以使用优雅且简洁的字符串函数translate。

' hello apple'.translate(None, ' \n\t\r')

或者如果您想更加彻底

import string
' hello  apple'.translate(None, string.whitespace)

- MaK

3

(re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()

这段代码将删除所有不必要的空格和换行符号。希望这能够帮到您。

import re
my_str = '   a     b \n c   '
formatted_str = (re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()

这将导致：

' a b \n c ' 将被更改为 'a b c'

- Safvan CK

2

    something = "\t  please_     \t remove_  all_    \n\n\n\nwhitespaces\n\t  "

    something = "".join(something.split())

输出：

请删除所有的空格

将Le Droid的评论添加到答案中。要使用空格分隔：

    something = "\t  please     \t remove  all   extra \n\n\n\nwhitespaces\n\t  "
    something = " ".join(something.split())

输出:

请删除所有额外的空格

- pbn

1

简单而高效。可以使用 " ".join(...) 来保持单词之间用空格分隔。 - Le Droid

2

在这里，我看了很多解决方案，但是对于一个由逗号分隔的字符串，我想知道该怎么办...

问题

在处理联系人信息的csv时，我需要解决这个问题：去除多余的空格和垃圾，但保留尾部逗号和内部空格。在处理包含联系人备注的字段时，我想要去除垃圾内容，只留下有用的信息。虽然我想去掉所有标点符号和废物，但我不想丢失复合标记之间的空格，因为我不想以后重新构建。

正则表达式和模式：`[\s_]+?\W+`

该模式从1到无限次懒惰地查找任何单个空格字符和下划线（“_”）与[\s_]+?一起出现，然后再查找1到无限次的非单词字符\W+（相当于[^a-zA-Z0-9_]）。具体来说，它可以找到大片的空格：空字符（\0）、制表符(\t)、换行符(\n)、前进页(\f)、回车(\r)。

我认为这种方法有两个优点：

它不会去除您可能想要保留在一起的完整单词/标记之间的空格；
Python内置的字符串方法strip()只处理字符串的左右两端，而默认参数是空字符（请参见下面的示例：文本中有几个换行符，strip()并未将它们全部删除，而正则表达式模式则可以）。text.strip(' \n\t\r')

这超出了OP的问题，但我认为，在文本数据中可能存在一些奇怪的、病态的实例，就像我的情况一样（某些转义字符以某种方式出现在文本中）。此外，在类似列表的字符串中，我们不想消除分隔符，除非分隔符分隔两个空格字符或某些非单词字符，如“-,”或“-, ,,,”。

NB：不谈论CSV本身的分隔符。仅讨论CSV中数据类似于列表的实例，即由子字符串组成的逗号分隔字符串。

完全公开：我只操作文本已经一个月了，而且只学了两周的正则表达式，所以我肯定会错过一些细节。尽管如此，对于较小的字符串集合（我的数据帧有12,000行和40个列），在通过一次去除多余字符的操作后，这种方法效果非常好，特别是如果您想要在非单词字符连接的文本中引入一些额外的空格，但不想在以前没有空格的地方添加空格。

例如：

import re


text = "\"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, , , , \r, , \0, ff dd \n invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, \n i69rpofhfsp9t7c practice 20ignition - 20june \t\n .2134.pdf 2109                                                 \n\n\n\nklkjsdf\""

print(f"Here is the text as formatted:\n{text}\n")
print()
print("Trimming both the whitespaces and the non-word characters that follow them.")
print()
trim_ws_punctn = re.compile(r'[\s_]+?\W+')
clean_text = trim_ws_punctn.sub(' ', text)
print(clean_text)
print()
print("what about 'strip()'?")
print(f"Here is the text, formatted as is:\n{text}\n")
clean_text = text.strip(' \n\t\r')  # strip out whitespace?
print()
print(f"Here is the text, formatted as is:\n{clean_text}\n")

print()
print("Are 'text' and 'clean_text' unchanged?")
print(clean_text == text)

这将输出：

Here is the text as formatted:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf" 

using regex to trim both the whitespaces and the non-word characters that follow them.

"portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk,  jim.somedude@blahblah.com, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf"

Very nice.
What about 'strip()'?

Here is the text, formatted as is:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"


Here is the text, after stipping with 'strip':


"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"
Are 'text' and 'clean_text' unchanged? 'True'

所以strip每次只从一个空格中移除一个。因此在OP的情况下，strip()就可以了。但是如果情况变得更加复杂，正则表达式和类似的模式可能对于更一般的设置有所价值。

在这里查看实际应用

- joshua fiddler

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- James Thompson · Accepted Answer

如果需要去除字符串两端的空格，请使用str.strip函数：

s = "  \t a string example\t  "
s = s.strip()

对于右侧的空格，请使用str.rstrip：

s = s.rstrip()

对于左侧的空格，请使用str.lstrip：

s = s.lstrip()

您可以向任何一个函数提供一个参数来过滤掉任意字符，就像这样：

s = s.strip(' \t\n\r')

这将从字符串两侧去除任何空格、\t、\n或\r字符。

上面的例子只会从字符串左右两侧删除字符串。如果您想从字符串中间也删除字符，请尝试使用re.sub：

import re
print(re.sub('[\s+]', '', s))

那应该打印出：

astringexample

如何去除字符串中的空格？

问题

正则表达式和模式：[\s_]+?\W+

正则表达式和模式：`[\s_]+?\W+`