主题建模 - 将文档分配给前两个主题作为类别标签 - sklearn潜在狄利克雷分配

Question

主题建模 - 将文档分配给前两个主题作为类别标签 - sklearn潜在狄利克雷分配

pythonpython-2.7scikit-learnldatopic-modeling

9

我现在正在使用LDA（潜在狄利克雷分配）主题建模方法来帮助从一组文档中提取主题。根据下面链接中所理解的，这是一种无监督学习方法，用于将每个文档与提取的主题分类/标记。

在该链接中给出的示例代码中，定义了一个函数来获取与每个识别出的主题相关联的前几个单词。

sklearn.__version__

Out[41]: '0.17'

from sklearn.decomposition import LatentDirichletAllocation 


def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("Topic #%d:" % topic_idx)
        print(" ".join([feature_names[i]
                        for i in topic.argsort()[:-n_top_words - 1:-1]]))
    print()

print("\nTopics in LDA model:")
tf_feature_names = tf_vectorizer.get_feature_names()
print_top_words(lda, tf_feature_names, n_top_words)

我的问题是，是否有任何组件或矩阵可以从建立的LDA模型中获取文档-主题关联？例如，我需要找到与每个文档相关的前2个主题作为该文档的文档标签/分类。是否有任何组件可以查找文档中主题的分布，类似于用于查找主题内单词分布的model.components_？

- Bala

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- clemgaut · Accepted Answer

您可以使用LDA类的transform(X)函数计算文档和主题之间的关联。例如代码如下：

doc_topic_distrib = lda.transform(tf)

使用LDA训练好的模型（fitted lda）进行转换时，输入要转换的数据为tf。