运行Gensim LDA有困难

4
我正在尝试按此处所述运行分布式LDA示例:https://radimrehurek.com/gensim/dist_lda.html。我按照这里的教程创建了一组文档:https://radimrehurek.com/gensim/dist_lsi.html,通过如其建议的“将语料库重复多次,使其扩展到100万个文档”。我正在使用Python 3.3和numpy 1.9.2,但我一直收到以下错误提示:
Exception in thread oneway-call:
Traceback (most recent call last):
  File "/usr/lib64/python3.3/threading.py", line 901, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.3/site-packages/Pyro4/core.py", line 1484, in run
    super(_OnewayCallThread, self).run()
  File "/usr/lib64/python3.3/threading.py", line 858, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.3/site-packages/gensim/models/lda_worker.py", line 71, in requestjob
    self.processjob(job)
  File "/usr/lib64/python3.3/site-packages/gensim/utils.py", line 98, in _synchronizer
    result = func(self, *args, **kwargs)
  File "/usr/lib64/python3.3/site-packages/gensim/models/lda_worker.py", line 80, in processjob
    self.model.do_estep(job)
  File "/usr/lib64/python3.3/site-packages/gensim/models/ldamodel.py", line 480, in do_estep
    gamma, sstats = self.inference(chunk, collect_sstats=True)
  File "/usr/lib64/python3.3/site-packages/gensim/models/ldamodel.py", line 423, in inference
    if doc and not isinstance(doc[0][0], six.integer_types):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我运行了分布式LSI的实例,它正常运行,但由于某种原因,我似乎无法让LDA工作。

我尝试更改“/usr/lib64/python3.3/site-packages/gensim/models/ldamodel.py”中的第423行:

if doc is not None and not isinstance(doc[0][0], six.integer_types):

错误已经消失了,但是我收到了一个警告信息:
FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.

有人能解释一下我做错了什么吗?我的对这个文件的修改是正确的吗?还是我应该以不同的方式运行LDA?

1个回答

0

这是gensim中的一个bug,可以在这里找到。

编辑 - 这个问题现在已经在这个pull request中解决了。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接