假设您看到的输出类似于此:
(0, 18) 0.424688479366
(0, 6) 0.424688479366
(0, 4) 0.424688479366
(0, 14) 0.239262081323
(0, 17) 0.202366335916
(0, 5) 0.424688479366
(0, 1) 0.424688479366
(1, 17) 0.184426607226
(1, 8) 0.387039944282
(1, 15) 0.387039944282
(1, 0) 0.387039944282
(1, 2) 0.387039944282
(1, 13) 0.387039944282
(1, 7) 0.387039944282
(1, 11) 0.259205161463
(2, 14) 0.313686744222
(2, 17) 0.530628478217
(2, 9) 0.556791722552
(2, 16) 0.556791722552
(3, 14) 0.346483013718
(3, 17) 0.293053113789
(3, 11) 0.411875926253
(3, 10) 0.61500486583
(3, 3) 0.496182053366
(4, 14) 0.346483013718
(4, 17) 0.293053113789
(4, 11) 0.411875926253
(4, 3) 0.496182053366
(4, 12) 0.61500486583
假设一般形式为:(A,B) C
A: 文档索引
B: 特定词向量索引
C: 在文档A中词B的TFIDF得分
这是一个稀疏矩阵,它表示每个文档中非零值的词向量的tfidf得分。