如何监控Gensim LDA模型的收敛性？

Question

如何监控Gensim LDA模型的收敛性？

pythonldagensimconvergence

16

我似乎找不到它，或者可能是我的统计知识和术语的问题，但我想实现与PyPI LDA库底部页面上找到的图形类似，并观察线条的均匀性/收敛性。如何使用 Gensim LDA 实现此目标？

- ZeferiniX

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- groceryheist · Accepted Answer

您希望绘制模型拟合的收敛情况，这是正确的想法。不幸的是，Gensim似乎并没有提供非常直接的方法。

Run the model in such a way that you will be able to analyze the output of the model fitting function. I like to setup a log file.

import logging
logging.basicConfig(filename='gensim.log',
                    format="%(asctime)s:%(levelname)s:%(message)s",
                    level=logging.INFO)

Set the eval_every parameter in LdaModel. The lower this value is the better resolution your plot will have. However, computing the perplexity can slow down your fit a lot!
```
lda_model = 
LdaModel(corpus=corpus,
         id2word=id2word,
         num_topics=30,
         eval_every=10,
         pass=40,
         iterations=5000)
```

Parse the log file and make your plot.

import re
import matplotlib.pyplot as plt
p = re.compile("(-*\d+\.\d+) per-word .* (\d+\.\d+) perplexity")
matches = [p.findall(l) for l in open('gensim.log')]
matches = [m for m in matches if len(m) > 0]
tuples = [t[0] for t in matches]
perplexity = [float(t[1]) for t in tuples]
liklihood = [float(t[0]) for t in tuples]
iter = list(range(0,len(tuples)*10,10))
plt.plot(iter,liklihood,c="black")
plt.ylabel("log liklihood")
plt.xlabel("iteration")
plt.title("Topic Model Convergence")
plt.grid()
plt.savefig("convergence_liklihood.pdf")
plt.close()