我希望使用R的gsub函数从文本中移除所有标点符号,但保留撇号。我对正则表达式还比较陌生,正在学习。
例子:
x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"
gsub("[[:punct:]]", "", as.character(x))
当前输出(“don't”中不带撇号)
[1] "I like to chew gum but dont like bubble gum"
期望的输出(我希望don't中的撇号保留)
[1] "I like to chew gum but don't like bubble gum"
gsub("[^[:alnum:][:space:]'\"]", "", x)
。 - Tyler Rinkergsub("[^[:alnum:][:space:]']", "", x)
,就可以了。(顺便说一句,正则表达式内部不需要反斜杠)。 - Josh O'Brien