如何从我的词云中删除单词？（Python 3）

Question

如何从我的词云中删除单词？（Python 3）

5

import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import wordcloud
from wordcloud import WordCloud,STOPWORDS

# Read the whole text.
remarks = open(r'C:\Users\marmar\Documents\Remarks.txt').read()

#Create words over an image
mask = np.array(Image.open(r'C:\users\marmar\Documents\cloud.png'))

#set the stopwords list
stopwords= set(STOPWORDS)

#append new words to the stopwords list
new_words =open(r'C:\Users\marmar\comments.txt').read()
new_stopwords=stopwords.union(new_words)

#generate the word cloud with parameters
wc = WordCloud(background_color="white", 
               max_words=2000, 
               mask=mask,
               min_font_size =12, 
               max_font_size=20, 
               relative_scaling = 0.5, 
               stopwords=new_stopwords,
               normalize_plurals= True)
wc.generate(remarks)
plt.figure(figsize=(25,25))
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")

#Show the wordcloud
plt.show()

基本上，我正在使用Python 3（Jupyter Notebook）创建一个带有实际云图片的词云。 WordCloud包实际上有自己的停用词功能。然而，我想将一些单词包括在停用词列表中，这些单词我不想在我的云中看到。我尝试在文本文件中添加一些单词，但我可以在我的云中看到这些单词。例如，文本文件如下所示： customer，CSR Customer，satisfied，Item Completed

如何添加更多单词到列表中？我尝试了add、append这两个函数，但它们都不起作用。

提前感谢您的回答。

- marmar

我尝试了stopwords.add('CSR Comment')，但我仍然可以在云端看到它！ - marmar

1

在调用WordCloud构造函数时，你似乎传递了stopwords=stopwords。难道你不想使用stopwords=new_stopwords吗？ - RagingRoosevelt

还要确保对文件进行标记化(tokenize)，以便将其逐字逐句地分解。您可以使用类似于 open(...).read().split() 的方法。 - RagingRoosevelt

嘿，你知道吗，很好的发现，但我仍然可以在云中看到这个词！我不明白... - marmar

添加了一个新回复。 - RagingRoosevelt

是的，你知道吗，它就是不想改变。也就是说，我仍然可以看到这些词在云端中。:( - marmar

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- marmar · Accepted Answer

啊哈！这是因为我的文本文件中使用逗号分隔单词。

对于那些正在构建词云的人，只需用空格分隔单词即可，无需使用标点符号。@RagingRoosevelt 正确地使用了 "split" 函数。