R情感分析;未找到“词典”;“情感”已损坏?

3

我正在尝试跟随这个关于情感分析的在线教程(链接)。以下是代码:

new_sentiments <- sentiments %>% #From the tidytext package
  filter(lexicon != "loughran") %>% #Remove the finance lexicon
  mutate( sentiment = ifelse(lexicon == "AFINN" & score >= 0, "positive",
                         ifelse(lexicon == "AFINN" & score < 0,
                                "negative", sentiment))) %>%
  group_by(lexicon) %>%
  mutate(words_in_lexicon = n_distinct(word)) %>%
  ungroup()

生成错误:

>Error in filter_impl(.data, quo) : 
>Evaluation error: object 'lexicon' not found.

也许与此相关的是,对我来说,“情感”表格似乎出现了异常(损坏?)。这是“情感”表格的表头:

> head(sentiments,3)
>  element_id sentence_id word_count sentiment                                  
> chapter
> 1          1           1          7         0 The First Book of Moses:  
> Called Genesis
> 2          2           1         NA         0 The First Book of Moses:  
> Called Genesis
> 3          3           1         NA         0 The First Book of Moses:  > 
> Called Genesis
>                                  category
> 1 The First Book of Moses:  Called Genesis
> 2 The First Book of Moses:  Called Genesis
> 3 The First Book of Moses:  Called Genesis

如果我使用Get_Sentiments来获取Bing、AFINN或NRC的情感分析结果,我会得到看起来合适的响应。
>  get_sentiments("bing")
> # A tibble: 6,788 x 2
>   word        sentiment
>   <chr>       <chr>    >   1 2-faced     negative 
> 2 2-faces     negative 
> 3 a+          positive 
> 4 abnormal    negative 

我尝试了删除(remove.packages)并重新安装tidytext,但行为没有改变。 我正在运行R 3.5。

即使我完全误解了问题,我也会感谢任何人能提供给我的见解。


1
当我使用stats::filter而不是dplyr::filter时,我看到了这个错误。也许library(dplyr)可以帮助解决问题? - r2evans
所以我弄错了,那个错误略有不同,对此感到抱歉。 filter_impl 错误可能是因为 sentiments 不是一个 data.frame 或者 它没有一个名为 lexicon 的列。 str(sentiments) 的输出是什么样子的? - r2evans
你的代码在我的机器上使用了dplyr和tidytext库并且给出了预期的输出结果。尝试重新启动你的R会话,然后再次运行代码。 - phiver
类别为“情感”,“数据表”和“数据框架”的类:104880 obs. of 6 variables: $ element_id : int 1 2 3 4 5 6 7 7 8 9 ... $ sentence_id: int 1 1 1 1 1 1 1 2 1 1 ... $ word_count : int 7 NA NA 10 NA 12 5 11 1 NA ... $ 情感 : num 0 0 0 0.253 0 ... $ 章节 : chr "摩西的第一本书:创世记" $ 类别 : chr "摩西的第一本书:创世记" (重复) - attr(, "sorted")= chr "element_id" "sentence_id"
  • attr(*, ".internal.selfref")=<externalptr>
  • attr(*, "sentences")=<environment: 0x000000000d8c2fe8>
- KEAppleby
尝试直接从R运行而不是从R-Studio运行。看到了这个消息:以下对象被“ .GlobalEnv”掩盖:sentiments猜测这是问题的一部分,如果不是全部的话。如何解决? - KEAppleby
显示剩余3条评论
3个回答

2
以下指令将修复 new_sentiments 数据集,如 Data Camp 教程 所示。
bing <- get_sentiments("bing") %>% 
     mutate(lexicon = "bing", 
            words_in_lexicon = n_distinct(word))    

nrc <- get_sentiments("nrc") %>% 
     mutate(lexicon = "nrc", 
            words_in_lexicon = n_distinct(word))

afinn <- get_sentiments("afinn") %>% 
     mutate(lexicon = "afinn", 
            words_in_lexicon = n_distinct(word))

new_sentiments <- bind_rows(bing, nrc, afinn)
names(new_sentiments)[names(new_sentiments) == 'value'] <- 'score'
new_sentiments %>% 
     group_by(lexicon, sentiment, words_in_lexicon) %>% 
     summarise(distinct_words = n_distinct(word)) %>% 
     ungroup() %>% 
     spread(sentiment, distinct_words) %>% 
     mutate(lexicon = color_tile("lightblue", "lightblue")(lexicon), 
            words_in_lexicon = color_bar("lightpink")(words_in_lexicon)) %>% 
     my_kable_styling(caption = "Word Counts per Lexicon")

下面的图表也能正常工作!

1
看起来需要更改 tidytext,这破坏了教程中的一些代码。
为了使代码正常运行,请替换:
new_sentiments <- sentiments %>% #From the tidytext package
  filter(lexicon != "loughran") %>% #Remove the finance lexicon
  mutate( sentiment = ifelse(lexicon == "AFINN" & score >= 0, "positive",
                              ifelse(lexicon == "AFINN" & score < 0,
                                     "negative", sentiment))) %>%
  group_by(lexicon) %>%
  mutate(words_in_lexicon = n_distinct(word)) %>%
  ungroup()

使用

new_sentiments <- get_sentiments("afinn")
names(new_sentiments)[names(new_sentiments) == 'value'] <- 'score'
new_sentiments <- new_sentiments %>% mutate(lexicon = "afinn", sentiment = ifelse(score >= 0, "positive", "negative"),
                                                     words_in_lexicon = n_distinct((word)))

接下来的几个图表可能没有那么多意义(因为我们现在只使用一个词汇表),但是本教程的其余部分仍然有效。 更新 这里有一位tidytext包的作者提供了一个很好的解释。

0

我发现了一个类似的问题,我尝试了下面的代码, 希望它能有所帮助

library(tm)
library(tidyr)
library(ggthemes)
library(ggplot2)
library(dplyr)
library(tidytext)
library(textdata)

# Choose the bing lexicon
get_sentiments("bing")
get_sentiments("afinn")
get_sentiments("nrc")

#define new
afinn=get_sentiments("afinn")
bing=get_sentiments("bing")
nrc=get_sentiments("nrc")

#check
head(afinn)
head(bing)
head(nrc)
head(sentiments) #from tidytext packages

#merging dataframe
merge_sentiments=rbind(sentiments,get_sentiments('bing'),get_sentiments('nrc'))
head(merge_sentiments) #check

merge2_sentiments=merge(merge_sentiments,afinn,by=1,all=T)
head(merge2_sentiments) #check

#make new data frame with column lexicon added
new_sentiments <- merge2_sentiments
new_sentiments <- new_sentiments %>% 
  mutate(lexicon=ifelse(sentiment=='positive','bing',ifelse(sentiment=='negative','bing',ifelse(sentiment=='NA','afinn','nrc'))))

colnames(new_sentiments)[colnames(new_sentiments)=='value']='score'

#check
head(new_sentiments)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接