去除带有特殊字符 "\" 和 "/" 的单词

3
在分析推文时,我遇到了带有\或/的“单词”(一个“单词”中可能会出现多个)。我希望完全删除这些单词,但却无法完全解决问题。
这是我尝试过的方法:
sen = 'this is \re\store and b\\fre'
sen1 = 'this i\s /re/store and b//fre/'

slash_back =  r'(?:[\w_]+\\[\w_]+)'
slash_fwd = r'(?:[\w_]+/+[\w_]+)'
slash_all = r'(?<!\S)[a-z-]+(?=[,.!?:;]?(?!\S))'

strt = re.sub(slash_back,"",sen)
strt1 = re.sub(slash_fwd,"",sen1)
strt2 = re.sub(slash_all,"",sen1)
print strt
print strt1
print strt2

我想获取:

this is and
this i\s and
this and

然而,我收到:

and 
this i\s / and /
i\s /re/store  b//fre/

补充说明:在这种情况下,“单词”是由空格或标点符号(如普通文本)分隔的字符串。

1
精美的问题表述。我希望有一个问题模板供提问者使用,类似于以下内容。 - d0nut
1
@iismathwizard 我不得不重新加载页面,以确保我的眼睛没有出错。 - kyle heitman
2个回答

1
这个怎么样?我添加了一些标点符号的例子:
import re

sen = r'this is \re\store and b\\fre'
sen1 = r'this i\s /re/store and b//fre/'
sen2 = r'this is \re\store, and b\\fre!'
sen3 = r'this i\s /re/store, and b//fre/!'

slash_back =  r'\s*(?:[\w_]*\\(?:[\w_]*\\)*[\w_]*)'
slash_fwd = r'\s*(?:[\w_]*/(?:[\w_]*/)*[\w_]*)'
slash_all = r'\s*(?:[\w_]*[/\\](?:[\w_]*[/\\])*[\w_]*)'

strt = re.sub(slash_back,"",sen)
strt1 = re.sub(slash_fwd,"",sen1)
strt2 = re.sub(slash_all,"",sen1)
strt3 = re.sub(slash_back,"",sen2)
strt4 = re.sub(slash_fwd,"",sen3)
strt5 = re.sub(slash_all,"",sen3)
print(strt)
print(strt1)
print(strt2)
print(strt3)
print(strt4)
print(strt5)

输出:

this is and
this i\s and
this and
this is, and!
this i\s, and!
this, and!

太棒了!像梦一样好用!非常感谢!! - Toly

1

如果不使用re,一种方法是使用join和推导式。

sen = 'this is \re\store and b\\fre'
sen1 = 'this i\s /re/store and b//fre/'

remove_back = lambda s: ' '.join(i for i in s.split() if '\\' not in i)
remove_forward = lambda s: ' '.join(i for i in s.split() if '/' not in i)

>>> print(remove_back(sen))
this is and
>>> print(remove_forward(sen1))
this i\s and
>>> print(remove_back(remove_forward(sen1)))
this and

有趣的方法!我认为这只是一个特定情况下的特定解决方案,而我正在寻找通用解决方案。马克的解决方案到目前为止已经在我的推文集合中处理了最复杂的字符串。谢谢! - Toly

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接