R情感分析；未找到“词典”；“情感”已损坏？

Question

R情感分析；未找到“词典”；“情感”已损坏？

3

我正在尝试跟随这个关于情感分析的在线教程（链接）。以下是代码：

new_sentiments <- sentiments %>% #From the tidytext package
  filter(lexicon != "loughran") %>% #Remove the finance lexicon
  mutate( sentiment = ifelse(lexicon == "AFINN" & score >= 0, "positive",
                         ifelse(lexicon == "AFINN" & score < 0,
                                "negative", sentiment))) %>%
  group_by(lexicon) %>%
  mutate(words_in_lexicon = n_distinct(word)) %>%
  ungroup()

生成错误：

>Error in filter_impl(.data, quo) : 
>Evaluation error: object 'lexicon' not found.

也许与此相关的是，对我来说，“情感”表格似乎出现了异常（损坏？）。这是“情感”表格的表头：

> head(sentiments,3)
>  element_id sentence_id word_count sentiment                                  
> chapter
> 1          1           1          7         0 The First Book of Moses:  
> Called Genesis
> 2          2           1         NA         0 The First Book of Moses:  
> Called Genesis
> 3          3           1         NA         0 The First Book of Moses:  > 
> Called Genesis
>                                  category
> 1 The First Book of Moses:  Called Genesis
> 2 The First Book of Moses:  Called Genesis
> 3 The First Book of Moses:  Called Genesis

如果我使用Get_Sentiments来获取Bing、AFINN或NRC的情感分析结果，我会得到看起来合适的响应。

>  get_sentiments("bing")
> # A tibble: 6,788 x 2
>   word        sentiment
>   <chr>       <chr>    >   1 2-faced     negative 
> 2 2-faces     negative 
> 3 a+          positive 
> 4 abnormal    negative

我尝试了删除（remove.packages）并重新安装tidytext，但行为没有改变。我正在运行R 3.5。

即使我完全误解了问题，我也会感谢任何人能提供给我的见解。

- KEAppleby

1

当我使用stats::filter而不是dplyr::filter时，我看到了这个错误。也许library(dplyr)可以帮助解决问题？ - r2evans

所以我弄错了，那个错误略有不同，对此感到抱歉。 filter_impl 错误可能是因为 sentiments 不是一个 data.frame 或者它没有一个名为 lexicon 的列。 str(sentiments) 的输出是什么样子的？ - r2evans

你的代码在我的机器上使用了dplyr和tidytext库并且给出了预期的输出结果。尝试重新启动你的R会话，然后再次运行代码。 - phiver

类别为“情感”，“数据表”和“数据框架”的类：104880 obs. of 6 variables: $ element_id : int 1 2 3 4 5 6 7 7 8 9 ... $ sentence_id: int 1 1 1 1 1 1 1 2 1 1 ... $ word_count : int 7 NA NA 10 NA 12 5 11 1 NA ... $ 情感 : num 0 0 0 0.253 0 ... $ 章节 : chr "摩西的第一本书：创世记" $ 类别 : chr "摩西的第一本书：创世记" (重复) - attr(, "sorted")= chr "element_id" "sentence_id"

attr(*, ".internal.selfref")=<externalptr>
attr(*, "sentences")=<environment: 0x000000000d8c2fe8>

- KEAppleby

尝试直接从R运行而不是从R-Studio运行。看到了这个消息：以下对象被“ .GlobalEnv”掩盖：sentiments猜测这是问题的一部分，如果不是全部的话。如何解决？ - KEAppleby

显示剩余3条评论

3个回答

1

看起来需要更改 tidytext，这破坏了教程中的一些代码。

为了使代码正常运行，请替换：

new_sentiments <- sentiments %>% #From the tidytext package
  filter(lexicon != "loughran") %>% #Remove the finance lexicon
  mutate( sentiment = ifelse(lexicon == "AFINN" & score >= 0, "positive",
                              ifelse(lexicon == "AFINN" & score < 0,
                                     "negative", sentiment))) %>%
  group_by(lexicon) %>%
  mutate(words_in_lexicon = n_distinct(word)) %>%
  ungroup()

使用

new_sentiments <- get_sentiments("afinn")
names(new_sentiments)[names(new_sentiments) == 'value'] <- 'score'
new_sentiments <- new_sentiments %>% mutate(lexicon = "afinn", sentiment = ifelse(score >= 0, "positive", "negative"),
                                                     words_in_lexicon = n_distinct((word)))

接下来的几个图表可能没有那么多意义（因为我们现在只使用一个词汇表），但是本教程的其余部分仍然有效。更新这里有一位tidytext包的作者提供了一个很好的解释。

- stevec

0

我发现了一个类似的问题，我尝试了下面的代码，希望它能有所帮助

library(tm)
library(tidyr)
library(ggthemes)
library(ggplot2)
library(dplyr)
library(tidytext)
library(textdata)

# Choose the bing lexicon
get_sentiments("bing")
get_sentiments("afinn")
get_sentiments("nrc")

#define new
afinn=get_sentiments("afinn")
bing=get_sentiments("bing")
nrc=get_sentiments("nrc")

#check
head(afinn)
head(bing)
head(nrc)
head(sentiments) #from tidytext packages

#merging dataframe
merge_sentiments=rbind(sentiments,get_sentiments('bing'),get_sentiments('nrc'))
head(merge_sentiments) #check

merge2_sentiments=merge(merge_sentiments,afinn,by=1,all=T)
head(merge2_sentiments) #check

#make new data frame with column lexicon added
new_sentiments <- merge2_sentiments
new_sentiments <- new_sentiments %>% 
  mutate(lexicon=ifelse(sentiment=='positive','bing',ifelse(sentiment=='negative','bing',ifelse(sentiment=='NA','afinn','nrc'))))

colnames(new_sentiments)[colnames(new_sentiments)=='value']='score'

#check
head(new_sentiments)

- Hera Masri'an

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ricsam · Accepted Answer

以下指令将修复 new_sentiments 数据集，如 Data Camp 教程所示。

bing <- get_sentiments("bing") %>% 
     mutate(lexicon = "bing", 
            words_in_lexicon = n_distinct(word))    

nrc <- get_sentiments("nrc") %>% 
     mutate(lexicon = "nrc", 
            words_in_lexicon = n_distinct(word))

afinn <- get_sentiments("afinn") %>% 
     mutate(lexicon = "afinn", 
            words_in_lexicon = n_distinct(word))

new_sentiments <- bind_rows(bing, nrc, afinn)
names(new_sentiments)[names(new_sentiments) == 'value'] <- 'score'
new_sentiments %>% 
     group_by(lexicon, sentiment, words_in_lexicon) %>% 
     summarise(distinct_words = n_distinct(word)) %>% 
     ungroup() %>% 
     spread(sentiment, distinct_words) %>% 
     mutate(lexicon = color_tile("lightblue", "lightblue")(lexicon), 
            words_in_lexicon = color_bar("lightpink")(words_in_lexicon)) %>% 
     my_kable_styling(caption = "Word Counts per Lexicon")

下面的图表也能正常工作！