如何在Windows上设置Stanford CoreNLP服务器以返回文本情感分析结果

Question

如何在Windows上设置Stanford CoreNLP服务器以返回文本情感分析结果

javaserverstanford-nlpsentiment-analysis

6

我正在尝试在Windows上使用Stanford CoreNLP设置本地服务器，用于计算超过1M篇文章和视频文本的情感分数。由于我不懂Java，所以需要一些帮助。

我成功安装了Stanford CoreNLP 3.6.0，并且已经运行了一个服务器：

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer

在我的另一台电脑上运行此 http post 请求可以正常工作，并且我会收到预期的响应（xxx.xxx.xxx.xxx 是服务器的 IP 地址）。

wget --post-data 'the quick brown fox jumped over the lazy dog' 'xxx.xxx.xxx.xxx:9000/?properties={"tokenize.whitespace": "true", "annotators": "tokenize,ssplit,pos,lemma,parse", "outputFormat": "json"}' -O -

然而，响应中并没有情感。显而易见的解决方案是添加一个注释器：

wget --post-data 'the quick brown fox jumped over the lazy dog' 'xxx.xxx.xxx.xxx:9000/?properties={"tokenize.whitespace": "true", "annotators": "tokenize,ssplit,pos,lemma,parse,sentiment", "outputFormat": "json"}' -O -

然而，在服务器端，我遇到了这个错误：

java.lang.IllegalArgumentException: Unknown annotator: sentiment
at edu.stanford.nlp.pipeline.StanfordCoreNLP.ensurePrerequisiteAnnotators(StanfordCoreNLP.java:281)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.getProperties(StanfordCoreNLPServer.java:476)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$CoreNLPHandler.handle(StanfordCoreNLPServer.java:350)
at com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source)
at sun.net.httpserver.AuthFilter.doFilter(Unknown Source)
at com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(Unknown Source)
at com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source)
at sun.net.httpserver.ServerImpl$Exchange.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.thread.run(Unknown Source)

下一个显而易见的解决方案是在启动服务器时添加一个参数，如下所示运行：

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment"

之前运行相同的http post会得到完全相同的结果和错误。

我是做错了什么，还是需要修改核心代码才能使其正常工作？我不懂Java，因此无法进行更改。

另外，这个类似的命令会启动一个控制台，并似乎正确加载了情感分析：

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators "tokenize,ssplit,pos,lemma,parse,sentiment"

[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator sentiment

Entering interactive shell. Type q RETURN or EOF to quit.
NLP> _

- Eric

那个情感分析器还不够好。 - Rusty

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gabor Angeli · Accepted Answer

尝试使用代码的GitHub版本运行。您的第一个解决方案是正确的 - 无法找到情感注释器是代码中的错误。

wget --post-data 'the quick brown fox jumped over the lazy dog' 'xxx.xxx.xxx.xxx:9000/?properties={"annotators": "tokenize,ssplit,pos,lemma,parse,sentiment", "outputFormat": "json"}' -O -

（一则说明：文档中的tokenize.whitespace属性是为了展示您可以传递任意属性，但我建议在生产环境中不要使用它。）