使用LDA主题建模信息作为特征，通过SVM进行文本分类。

Question

使用LDA主题建模信息作为特征，通过SVM进行文本分类。

pythonclassificationsvmlda

6

我希望使用主题建模信息作为输入特征，通过svm分类器进行文本分类。因此，我想知道如何在数据集的训练和测试分区上执行LDA以生成主题建模特征，因为两个分区的语料库不同，这是否可行？

我是否做出了错误的假设？

您能否提供一个使用Scikit Learn的示例来说明如何操作？

- asterix

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ash · Accepted Answer

您的猜测是正确的。您需要在训练数据上训练LDA模型，然后基于该训练好的模型转换训练和测试数据。

因此，您将获得类似以下的结果：

from sklearn.decomposition import LatentDirichletAllocation as LDA
lda = LDA(n_topics=10,...)
lda.fit(training_data)
training_features = lda.transform(training_data)
testing_features = lda.transform(testing_data)

如果我是你，我会使用numpy.hstack或scipy.hstack将LDA特征与Bag of words特征连接起来，如果你的bow特征是稀疏的话。