在Python中,有没有一种方法可以将属于某个主题的文档映射起来。例如,一个主要是“主题0”的文档列表。我知道有办法列出每个文档的主题,但如何做到相反呢?
编辑:
我正在使用以下脚本进行LDA:
doc_set = []
for file in files:
newpath = (os.path.join(my_path, file))
newpath1 = textract.process(newpath)
newpath2 = newpath1.decode("utf-8")
doc_set.append(newpath2)
texts = []
for i in doc_set:
raw = i.lower()
tokens = tokenizer.tokenize(raw)
stopped_tokens = [i for i in tokens if not i in stopwords.words()]
stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]
texts.append(stemmed_tokens)
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, random_state=0, id2word = dictionary, passes=1)