从字符串中去除标点的最佳方法

829

看起来应该有比这更简单的方法:

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

有吗?


4
我觉得这很简单明了。你为什么想要改变它?如果你想让它更容易,只需将你刚刚写的内容放入一个函数中即可。 - Hannes Ovrén
3
嗯,似乎使用 str.translate 的副作用来完成工作有点笨拙。我认为可能会有类似于 str.strip(chars) 的更好的方法来处理整个字符串而不仅仅是我错过的边界部分。 - Redwood
64
取决于你所指的标点符号。"The temperature in the O'Reilly & Arbuthnot-Smythe server's main rack is 40.5 degrees." 包含了一个标点符号,即第二个句点。请注意不要改变原意。 - John Machin
43
我很惊讶没有人提到 string.punctuation 根本不包括非英文标点符号。我在想的是“。”、“!”, “?”、“:”、“×”、““”、“””、〟等等。 - Clément
2
@JohnMachin 你忘了 ' ' 是标点符号 - Wayne Werner
显示剩余8条评论
32个回答

-1
使用Python从文本文件中删除停用词。
print('====THIS IS HOW TO REMOVE STOP WORS====')

with open('one.txt','r')as myFile:

    str1=myFile.read()

    stop_words ="not", "is", "it", "By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these"

    myList=[]

    myList.extend(str1.split(" "))

    for i in myList:

        if i not in stop_words:

            print ("____________")

            print(i,end='\n')

-3
我喜欢使用这样的函数:
def scrub(abc):
    while abc[-1] is in list(string.punctuation):
        abc=abc[:-1]
    while abc[0] is in list(string.punctuation):
        abc=abc[1:]
    return abc

1
这是从开头和结尾剥离字符的方法;对于此操作,请改用 abc.strip(string.punctuation)。它不会删除中间的这些字符。 - Martijn Pieters
不要紧,无论函数是否完成它的任务,不必要地使用副作用并在遍历列表时修改它只会导致灾难。 - Dexter Legaspi

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接