我们都知道sed非常擅长在文件中查找和替换所有单词的出现:
sed -i 's/original_word/new_word/g' file.txt
但是,有人可以向我展示如何从文件中向sed提供“original_words”列表吗(类似于grep -f)?我只想用''替换所有内容(删除它们)。
原始单词列表文件只是一堆以行分隔的停用词(wordlist.txt):
a
about
above
according
across
after
afterwards
这是一种简单的方法,可以将停用词列表从语料库中删除(用于数据清理)。file.txt文件如下:
05ricardo RT @shakira: Immigration reform isn't about politics. It's about people mothers, kids. Obama is working for all of them. http://t.co/rAW ... 0
05ricardo ?@ItsReginaG: Don't vote Obama. Because you will lose jobs, and die.? Lol 0
05ricardo ?@shakira: Obama doubles Pell Grants - 700,000 more Latinos get help to go to college. Meet Johanny Adames http://t.co/EMg8NLGl Shak?. ? -1
05rodriguez_a My Comm teacher gave me a copy of Obama's speech that he gave the other night and I cried while reading it. It was that moving. -3