简单的Python实现协同主题建模？

Question

简单的Python实现协同主题建模？

pythonmachine-learningldatopic-modelingcollaborative-filtering

32

我看到了这两篇论文，它们结合了协同过滤（矩阵分解）和主题建模（LDA），根据用户感兴趣的文章/帖子主题词推荐类似的文章/帖子。

这些论文（PDF格式）是： "Collaborative Topic Modeling for Recommending Scientific Articles" 和 "Collaborative Topic Modeling for Recommending GitHub Repositories"

新算法称为协作主题回归。我希望能找到一些实现此算法的python代码，但没有成功。这可能比较困难，但有人能展示一个简单的python例子吗？

- jxn

6

这里列出了几个用于主题建模的Python软件包，网址为https://www.cs.princeton.edu/~blei/topicmodeling.html。 - user4322779

在C++中，有ctr。 - kamalbanga

2

kamalbanga上面的存储库使用了你提到的第一篇论文。虽然它是用C++编写的，但你可以从Python中调用它。 - jtitusj

请查看下面答案中的链接，那里有一个Python代码示例 - scikit-learn.org网站提供 - 它恰好符合您的需求。问候 - A. STEFANI

最好的包是gensim，你可以非常容易地通过pip install安装它。这是主题页面：https://radimrehurek.com/gensim/tut2.html。关于你实际的问题，看起来...哦不，等等我找到了。 - Eugene

下面的回答解决了你的问题吗？ - Eugene

2个回答

0

一个使用gensim实现的非常简单的LDA。你可以在这里找到更多信息：https://radimrehurek.com/gensim/tutorial.html 希望能对你有所帮助。

from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
from nltk.stem import RSLPStemmer
from gensim import corpora, models
import gensim

st = RSLPStemmer()
texts = []

doc1 = "Veganism is both the practice of abstaining from the use of animal products, particularly in diet, and an associated philosophy that rejects the commodity status of animals"
doc2 = "A follower of either the diet or the philosophy is known as a vegan."
doc3 = "Distinctions are sometimes made between several categories of veganism."
doc4 = "Dietary vegans refrain from ingesting animal products. This means avoiding not only meat but also egg and dairy products and other animal-derived foodstuffs."
doc5 = "Some dietary vegans choose to wear clothing that includes animal products (for example, leather or wool)." 

docs = [doc1, doc2, doc3, doc4, doc5]

for i in docs:

    tokens = word_tokenize(i.lower())
    stopped_tokens = [w for w in tokens if not w in stopwords.words('english')]
    stemmed_tokens = [st.stem(i) for i in stopped_tokens]
    texts.append(stemmed_tokens)

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# generate LDA model using gensim  
ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word = dictionary, passes=20)
print(ldamodel.print_topics(num_topics=2, num_words=4))

[(0, u'0.066*动物 + 0.065*, + 0.047*产品 + 0.028*哲学'), (1, u'0.085*. + 0.047*产品 + 0.028*膳食 + 0.028*素食')]

- Vinicius Woloszyn

你如何解释所给定的主题？ - Bhaskar Dhariyal

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Eugene · Accepted Answer

这将帮助你入门（尽管不确定为什么这还没有发布）：https://github.com/arongdari/python-topic-model更具体地说：https://github.com/arongdari/python-topic-model/blob/master/ptm/collabotm.py

class CollaborativeTopicModel:
    """
    Wang, Chong, and David M. Blei. "Collaborative topic 
                                modeling for recommending scientific articles."
    Proceedings of the 17th ACM SIGKDD international conference on Knowledge
                                discovery and data mining. ACM, 2011.
    Attributes
    ----------
    n_item: int
        number of items
    n_user: int
        number of users
    R: ndarray, shape (n_user, n_item)
        user x item rating matrix
    """

看起来不错而且简单易懂。我仍然建议至少查看gensim。Radim非常出色地优化了该软件。