是否可以查找特定主题(由LDA确定)中包含的文本?
我用LDA找到了一个包含10个单词的五个主题列表。
我已经在DataFrame的一列中分析了这些文本。我想选择/筛选属于特定主题的行/文本。
如果您需要更多信息,我可以提供给您。
我所指的步骤返回此输出:
[(0,
'0.207*"house" + 0.137*"apartment" + 0.118*"sold" + 0.092*"beach" + '
'0.057*"kitchen" + 0.049*"rent" + 0.033*"landlord" + 0.026*"year" + '
'0.024*"bedroom" + 0.023*"home"'),
(1,
'0.270*"school" + 0.138*"homeworks" + 0.117*"students" + 0.084*"teacher" + '
'0.065*"pen" + 0.038*"books" + 0.022*"maths" + 0.020*"exercise" + '
'0.020*"friends" + 0.020*"college"'),
... ]
由谁创建
# LDA Model
lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,
id2word=id2word,
num_topics=num_topics,
random_state=100,
update_every=1,
chunksize=100,
passes=10,
alpha='auto',
# alpha=[0.01]*num_topics,
per_word_topics=True,
eta=[0.01]*len(id2word.keys()))
打印10个主题中的关键词
from pprint import pprint
pprint(lda_model.print_topics())
doc_lda = lda_model[corpus]
已分析的文本原始列名为Texts
,它看起来像:
Texts
"Children are happy to go to school..."
"The average price for buying a house is ... "
"Our children love parks so we should consider to buy an apartment nearby"
etc etc...
我期望的输出结果是
Texts Topic
"Children are happy to go to school..." 2
"The average price for buying a house is ... " 1
"Our children love parks so we should consider to buy an apartment nearby"
2
感谢您的选择。
topic_score[1]
中,对于topic_score in sent
和sent in doc_lda
,我遇到了IndexError: list index out of range
错误。你代码中的sent
是什么?它只是一个变量吗? - user13623188HdpModel
的答案,但我得到了类似于[0.7355845586174742],[],[0.6889279786058412],
这样的东西。 - Ali A. Jalil