为什么使用nltk的斯坦福解析器无法正确解析句子？

Question

为什么使用nltk的斯坦福解析器无法正确解析句子？

6

我正在使用Python中的nltk和Stanford解析器，并从Stanford Parser and NLTK获得帮助来设置Stanford nlp库。

from nltk.parse.stanford import StanfordParser
from nltk.parse.stanford import StanfordDependencyParser
parser     = StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
dep_parser = StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
one = ("John sees Bill")
parsed_Sentence = parser.raw_parse(one)
# GUI
for line in parsed_Sentence:
       print line
       line.draw()

parsed_Sentence = [parse.tree() for parse in dep_parser.raw_parse(one)]
print parsed_Sentence

# GUI
for line in parsed_Sentence:
        print line
        line.draw()

我在获取解析和依存树时出现错误，如下例所示，它将“sees”视为名词而不是动词。

我该怎么做？当我改变句子时，比如（one = 'John see Bill'），它运行得很完美。这个句子的正确输出可以在这里查看解析树的正确输出。

以下是正确输出的示例：

- Nomiluks

请发布完整的代码片段，以便其他人了解dep_parser的来源 =) - alvas

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- alvas · Accepted Answer

再次强调，没有一个模型是完美的（请参见Python NLTK pos_tag不能返回正确的词性标记）;P

您可以尝试使用“更准确”的解析器，使用NeuralDependencyParser。

首先，使用正确的环境变量设置解析器（请参见Stanford Parser and NLTK和https://gist.github.com/alvations/e1df0ba227e542955a8a），然后：

>>> from nltk.internals import find_jars_within_path
>>> from nltk.parse.stanford import StanfordNeuralDependencyParser
>>> parser = StanfordNeuralDependencyParser(model_path="edu/stanford/nlp/models/parser/nndep/english_UD.gz")
>>> stanford_dir = parser._classpath[0].rpartition('/')[0]
>>> slf4j_jar = stanford_dir + '/slf4j-api.jar'
>>> parser._classpath = list(parser._classpath) + [slf4j_jar]
>>> parser.java_options = '-mx5000m'
>>> sent = "John sees Bill"
>>> [parse.tree() for parse in parser.raw_parse(sent)]
[Tree('sees', ['John', 'Bill'])]

请注意，NeuralDependencyParser仅生成依赖树：