斯坦福核心NLP：如何从解析树中获取标签、位置和类型依赖关系

Question

斯坦福核心NLP：如何从解析树中获取标签、位置和类型依赖关系

stanford-nlp

4

我正在使用Stanford coreNLP解析一些文本。我得到了多个句子。在这些句子中，我使用TregexPattern提取了名词短语。因此，我得到了一个子树，即我的名词短语。我还设法找出了名词短语的头部。

如何获得该头部在句子中的位置甚至标记/核心标签？

更好的是，如何找到头部与句子其余部分之间的依赖关系？

以下是一个示例：

public void doSomeTextKarate(String text){

    Properties props = new Properties();
    props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    this.pipeline = pipeline;


    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);
    // run all Annotators on this text
    pipeline.annotate(document);

    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

    for (CoreMap sentence : sentences) {


        SemanticGraph basicDeps = sentence.get(BasicDependenciesAnnotation.class);
        Collection<TypedDependency> typedDeps = basicDeps.typedDependencies();
        System.out.println("typedDeps ==>  "+typedDeps);

        SemanticGraph collDeps = sentence.get(CollapsedDependenciesAnnotation.class);
        SemanticGraph collCCDeps = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);

        List<CoreMap> numerizedTokens = sentence.get(NumerizedTokensAnnotation.class);
        List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);

        Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);

        sentenceTree.percolateHeads(headFinder);
        Set<Dependency<Label, Label, Object> > sentenceDeps =   sentenceTree.dependencies();
        for (Dependency<Label, Label, Object> dependency : sentenceDeps) {
            System.out.println("sentence dep = " + dependency);

            System.out.println(dependency.getClass() +" ( " + dependency.governor() + ", " + dependency.dependent() +") " );
        }


        //find nounPhrases in setence
        TregexPattern pat = TregexPattern.compile("@NP");
        TregexMatcher matcher = pat.matcher(sentenceTree);
        while (matcher.find()) {

            Tree nounPhraseTree = matcher.getMatch();
            System.out.println("Found noun phrase " + nounPhraseTree);

            nounPhraseTree.percolateHeads(headFinder);

            Set<Dependency<Label, Label, Object> > npDeps = nounPhraseTree.dependencies();
            for (Dependency<Label, Label, Object> dependency : npDeps ) {
                System.out.println("nounPhraseTree  dep = " + dependency);
            }


            Tree head = nounPhraseTree.headTerminal(headFinder);
            System.out.println("head " + head);


            Set<Dependency<Label, Label, Object> > headDeps = head.dependencies();
            for (Dependency<Label, Label, Object> dependency : headDeps) {
                System.out.println("head dep " + dependency);
            }


            //QUESTION : 
            //How do I get the position of "head" in tokens or numerizedTokens ?
            //How do I get the dependencies where "head" is involved in typedDeps ? 

        }
    }
}

换句话说，我想查询与整个句子中的“head”单词/标记/标签有关的所有依赖关系。因此，我认为需要弄清楚该标记在句子中的位置，以便将其与输入依赖关系相关联，但也许有更简单的方法？提前感谢您。

[编辑]

所以我可能已经找到了答案或者说是开始找到了答案。如果我在head上调用.label()，我会得到一个CoreLabel，这基本上就是我需要找到剩下部分的东西。现在，我可以遍历输入的依赖关系，并搜索其中支配标签或从属标签具有与我的headLabel相同的索引的依赖关系。

            Tree nounPhraseTree = matcher.getMatch();
            System.out.println("Found noun phrase " + nounPhraseTree);

            nounPhraseTree.percolateHeads(headFinder);
            Tree head = nounPhraseTree.headTerminal(headFinder);
            CoreLabel headLabel = (CoreLabel) head.label();

            System.out.println("tokens.contains(headLabel)" + tokens.contains(headLabel));

            System.out.println("");
            System.out.println("Iterating over typed deps");
            for (TypedDependency typedDependency : typedDeps) {
                System.out.println(typedDependency.gov().backingLabel());
                System.out.println("gov pos "+ typedDependency.gov() + " - " + typedDependency.gov().index());
                System.out.println("dep pos "+ typedDependency.dep() + " - " + typedDependency.dep().index());

                if(typedDependency.gov().index() == headLabel.index() ){

                    System.out.println("dep or gov backing label equals headlabel :" + (typedDependency.gov().backingLabel().equals(headLabel) ||
                            typedDependency.dep().backingLabel().equals(headLabel)));  //why does this return false all the time ? 


                    System.out.println(" !!!!!!!!!!!!!!!!!!!!!  HIT ON " + headLabel + " == " + typedDependency.gov());
                }
            }

看起来我只能使用索引将我的头标签与typedDeps中的标签进行匹配。我想知道这是否是正确的做法。如您在代码中所见，我还尝试使用TypedDependency.backingLabel()来测试与我的头标签（无论是与governor还是dependent）的相等性，但它始终返回false。我不知道为什么！

欢迎任何反馈。

- azpublic

depparse只会提供依赖分析（SemanticGraph对象）。它是一个完整的依赖句法分析器，并支持与标准parse注释器相同的折叠操作（如果您对此感兴趣）。 - Jon Gauthier

但是使用depparse和SemanticGraph，我不能像我的例子那样进行TregexPattern匹配，或者如何可能？ - azpublic

在依存句法分析表示中，没有直接的方法来做到这一点。如果你真的想追求速度并且只使用快速的“depparse”，你可以查看你的数据，并找出一种使用依存句法分析结果匹配类似名词短语的跨度的方法。当然，可以近似地表示组成句法分析中NP结构所代表的内容；只是该表示不会直接告诉你。 - Jon Gauthier

@hamid 我使用了"SemanticHeadFinder headFinder = new SemanticHeadFinder();" 但我想你可以根据你的具体需求实例化任何其他的头查找器。 - azpublic

дЅїз”ЁTregexMatcherеЇ№и±Ўжњ¬иє«дёЉзљ„getHeadFinder()ж–№жі•жЂЋд№€ж ·пјџй‚Јдјљиµ·дЅњз”Ёеђ—пјџ - Sudhi

显示剩余3条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jon Gauthier · Accepted Answer

使用CoreAnnotations.IndexAnnotation注释可以获取包含句子中CoreLabel的位置。

您查找给定单词的所有从属词的方法似乎是正确的，可能是最简单的方法。