如何在Python中将所有特殊字符替换为空格？

Question

如何在Python中将所有特殊字符替换为空格？

pythonreplacespecial-characterswhitespacetext-files

6

如何在Python中用空格替换所有特殊字符？

我有一家公司的名称列表...

例如：[myfiles.txt]

MY company.INC Old Wine pvt master-minds ltd "apex-labs ltd" "India-New corp" Indo-American pvt/ltd

在上面的例子中，我需要将文件myfiles.txt中的所有特殊字符[-,",/,.]替换为单个空格，并保存到另一个文本文件myfiles1.txt中。

请问有人可以帮我吗？

- Tyler Durden

12

每个角色都有其独特的特点。 - Ignacio Vazquez-Abrams

2

没有非特殊字符。如果有一个，那么就会有一个最小的非特殊字符。这将使它成为特殊字符。 - Has QUIT--Anony-Mousse

5个回答

5

import string

specials = '-"/.' #etc
trans = string.maketrans(specials, ' '*len(specials))
#for line in file
cleanline = line.translate(trans)

e.g.

>>> line = "Indo-American pvt/ltd"
>>> line.translate(trans)
'Indo American pvt ltd'

- dabhaid

太好了！但我希望它能自动保存到文本文件中......就像从myfile.txt读取每一行并在替换后保存到myfiles1.txt中。 - Tyler Durden

然后在转换后添加一行代码来确切地执行此操作！ - Don Question

@Yeshu91 如果f是你的文件句柄（例如f = open('cleanfile.txt'，'w'），那么只需要在结尾处添加f.write(cleanline)。 - dabhaid

4

import re
strs = "how much for the maple syrup? $20.99? That's ricidulous!!!"
strs = re.sub(r'[?|$|.|!]',r'',strs) #for remove particular special char
strs = re.sub(r'[^a-zA-Z0-9 ]',r'',strs) #for remove all characters
strs=''.join(c if c not in map(str,range(0,10)) else '' for c in strs) #for remove numbers
strs = re.sub('  ',' ',strs) #for remove extra spaces
print(strs) 

Ans: how much for the maple syrup Thats ricidulous

- PrabhuPrakash

1

虽然maketrans是最快的方法，但我从来没有记住过它的语法。由于速度很少是一个问题，而且我知道正则表达式，所以我倾向于这样做：

>>> line = "-[myfiles.txt] MY company.INC"
>>> import re
>>> re.sub(r'[^a-zA-Z0-9]', ' ',line)
'  myfiles txt  MY company INC'

这样做的额外好处是声明您接受的字符而不是拒绝的字符，在这种情况下会更容易。

当然，如果你正在使用非ASCII字符，你就必须回到删除拒绝的字符。如果只有标点符号，可以这样做：

>>> import string
>>> chars = re.escape(string.punctuation)
>>> re.sub(r'['+chars+']', ' ',line)
'  myfiles txt  MY company INC'

但你会注意到

- Bite code

0

起初我想提供一个string.maketrans/translate的例子，但是也许你正在使用一些utf-8编码的字符串，而ord()排序的翻译表会在你面前爆炸，所以我想到了另一个解决方案：

conversion = '-"/.'
text =  f.read()
newtext = ''
for c in text:
    newtext += ' ' if c in conversion else c

这可能不是最快的方法，但易于理解和修改。

因此，如果您的文本是非ascii字符，您可以解码转换和文本字符串为unicode，然后重新编码为任何您想要的编码。

- Don Question

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Stephen Emslie · Accepted Answer

假设您的意思是更改所有非字母数字字符，您可以在命令行上执行以下操作：

cat foo.txt | sed "s/[^A-Za-z0-99]/ /g" > bar.txt

或者在Python中使用re模块：

import re
original_string = open('foo.txt').read()
new_string = re.sub('[^a-zA-Z0-9\n\.]', ' ', original_string)
open('bar.txt', 'w').write(new_string)