在R中进行情感分析

Question

在R中进行情感分析

6

我对情感分析很新，完全不知道如何使用R进行。因此，我希望在这方面寻求帮助和指导。

我有一组包含意见的数据，并希望分析这些意见。

Title      Date            Content    
Boy        May 13 2015     "She is pretty", Tom said. 
Animal     June 14 2015    The penguin is cute, lion added.
Human      March 09 2015   Mr Koh predicted that every human is smart..
Monster    Jan 22 2015     Ms May, a student, said that John has $10.80.

谢谢你。

- poppp

它与您之前的问题有何不同？（链接为http://stackoverflow.com/questions/32576046/text-mining-with-r） - user3710546

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ken Benoit · Accepted Answer

情感分析是一种广泛的方法类别，旨在从文本中测量积极和消极情绪，因此这是一个相当困难的问题。但这里有一个简单的答案：您可以将字典应用于您的文档-术语矩阵，然后结合字典的积极与消极关键类别创建情感度量。

我建议在文本分析包quanteda中尝试此操作，该软件可以处理各种现有字典格式，并允许您创建非常灵活的自定义字典。

例如：

require(quanteda)
mycorpus <- subset(inaugCorpus, Year>1980)
mydict <- dictionary(list(negative = c("detriment*", "bad*", "awful*", "terrib*", "horribl*"),
                          postive = c("good", "great", "super*", "excellent")))
myDfm <- dfm(mycorpus, dictionary = mydict)
## Creating a dfm from a corpus ...
##    ... lowercasing
##    ... tokenizing
##    ... indexing documents: 9 documents
##    ... indexing features: 3,113 feature types
##    ... applying a dictionary consisting of 2 keys
##    ... created a 9 x 2 sparse dfm
##    ... complete. 
## Elapsed time: 0.057 seconds.
myDfm
## Document-feature matrix of: 9 documents, 2 features.
## 9 x 2 sparse Matrix of class "dfmSparse"
##               features
## docs           negative postive
##   1981-Reagan         0       6
##   1985-Reagan         0       6
##   1989-Bush           0      18
##   1993-Clinton        1       2
##   1997-Clinton        2       8
##   2001-Bush           1       6
##   2005-Bush           0       8
##   2009-Obama          2       3
##   2013-Obama          1       3

# use a LIWC dictionary - obviously you need this file
liwcdict <- dictionary(file = "LIWC2001_English.dic", format = "LIWC")
myDfmLIWC <- dfm(mycorpus, dictionary = liwcdict)
## Creating a dfm from a corpus ...
##    ... lowercasing
##    ... tokenizing
##    ... indexing documents: 9 documents
##    ... indexing features: 3,113 feature types
##    ... applying a dictionary consisting of 68 keys
##    ... created a 9 x 68 sparse dfm
##    ... complete. 
## Elapsed time: 1.844 seconds.
myDfmLIWC[, grep("^Pos|^Neg", features(myDfmLIWC))]
## Document-feature matrix of: 9 documents, 4 features.
## 9 x 4 sparse Matrix of class "dfmSparse"
##               features
## docs           Negate Posemo Posfeel Negemo
##   1981-Reagan      46     89       5     24
##   1985-Reagan      28    104       7     33
##   1989-Bush        40    102      10      8
##   1993-Clinton     25     51       3     23
##   1997-Clinton     27     64       5     22
##   2001-Bush        40     80       6     27
##   2005-Bush        25    117       5     31
##   2009-Obama       40     83       5     46
##   2013-Obama       42     80      13     22

假设您已经将语料库存储为名为data的数据框，您可以使用以下代码创建一个quanteda语料库：

mycorpus <- corpus(data$Content, docvars = data[, 1:2])

另请参阅?textfile，以一种简单的命令从文件中加载内容。这适用于.csv文件，尽管您可能会遇到该文件的问题，因为Content字段包含包含逗号的文本。

当然还有许多其他衡量情感的方法，但如果您是情感挖掘和R的新手，那么这应该可以帮助您入门。您可以从以下链接中阅读更多情感挖掘方法（如果您已经遇到它们，我们表示歉意）：

刘冰。2010。“情感分析和主观性。”自然语言处理手册 2：627-66。
刘冰。2015年。情感分析：挖掘意见，情感和情绪。剑桥大学出版社。