我有以下代码:
train_set = ("The sky is blue.", "The sun is bright.")
test_set = ("The sun in the sky is bright.",
"We can see the shining sun, the bright sun.")
现在我正在尝试像这样计算单词频率:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
接下来我想要打印词汇表。因此我执行以下操作:
vectorizer.fit_transform(train_set)
print vectorizer.vocabulary
目前我得到的输出是“none”。然而我期望得到类似以下的结果:
{'blue': 0, 'sun': 1, 'bright': 2, 'sky': 3}
有没有想法出了什么问题?