我正在尝试理解以下代码
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
corpus = ['This is the first document.','This is the second second document.','And the third one.','Is this the first document?']
X = vectorizer.fit_transform(corpus)
当我尝试打印X以查看返回结果时,我得到了以下结果:
(0, 1) 1
(0, 2) 1
(0, 6) 1
(0, 3) 1
(0, 8) 1
(1, 5) 2
(1, 1) 1
(1, 6) 1
(1, 3) 1
(1, 8) 1
(2, 4) 1
(2, 7) 1
(2, 0) 1
(2, 6) 1
(3, 1) 1
(3, 2) 1
(3, 6) 1
(3, 3) 1
(3, 8) 1
然而,我不理解这个结果的含义?
vectorizer.get_feature_names()
来检查实际使用索引的词汇单词。 - Vivek Kumar