我试图从文本字符串的列表中删除某些单词(除了使用停用词),但由于某些原因它没有生效。
documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]
exclude = ['am', 'there','here', 'for', 'of', 'user']
new_doc = [word for word in documents if word not in exclude]
print new_doc
输出
['Human machine interface for lab abc computer applications', 'A survey of user opinion of computer system response time', 'The EPS user interface management system', 'System and human system engineering testing of EPS', 'Relation of user perceived response time to error measurement', 'The generation of random binary unordered trees', 'The intersection graph of paths in trees', 'Graph minors IV Widths of trees and well quasi ordering', 'Graph minors A survey']
正如您所看到的,EXCLUDE 中的单词不会从 DOCUMENTS 中删除(例如,“for”就是一个很好的例子)。
它使用这个运算符:
new_doc = [word for word in str(documents).split() if word not in exclude]
但是如何在“已清除”后将初始元素(虽然是“已清除的”)重新获取到DOCUMENTS中呢?
非常感谢你的帮助!
word
不是一个单词,它是一整行(例如:“用于实验 abc 计算机应用的人机界面”),因此永远不会在exclude
中。 - jonrsharpe