使用antlr4从Java源代码创建AST并提取方法、变量和注释的简单示例是否存在?

20

能否提供一个详细的例子,说明我如何使用antlr4实现这个?从安装antlr4及其依赖项开始的指令将不胜感激。


我无法给你一个详细的例子,但如果你找到一些antlr4的java语法,你可以使用新的antlr4特性(访问者生成)。在他们出色的书中有很好的描述。你可以从这里开始 https://github.com/antlr/grammars-v4/tree/master/java - Leo
1
但是对于像我这样的新手来说,没有很好的文档示例。我知道如何下载Java.g4语法并创建令牌等。但是我不知道在那之后该做什么。我认为一个完整详细的示例会帮助我和许多其他人。 - user3266901
当然有啦!:-) http://leonotepad.blogspot.com.br/2014/01/playing-with-antlr4-primefaces.html - Leo
1
这本书也很棒,新手(像你和我)也能理解。http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference - Leo
谢谢Leo,我已将你的文章添加到antlr4 wiki主页上:https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Articles+and+Resources - Terence Parr
3个回答

29

这就是它。

首先,你需要购买ANTLR4书籍 ;-)

第二步,你需要下载antlr4 jar包和Java语法文件(http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference)

接下来,你可以稍微修改一下语法,在头部添加以下内容。

    (...)
grammar Java;

options 
{
    language = Java;
}

// starting point for parsing a java file
compilationUnit
    (...)

我会稍微改一下语法,只是为了说明一些东西。

/*
methodDeclaration
    :   (type|'void') Identifier formalParameters ('[' ']')*
        ('throws' qualifiedNameList)?
        (   methodBody
        |   ';'
        )
    ;
*/
methodDeclaration
    :   (type|'void') myMethodName formalParameters ('[' ']')*
        ('throws' qualifiedNameList)?
        (   methodBody
        |   ';'
        )
    ;

myMethodName
    :   Identifier
    ;

你看,原始语法无法让你区分方法标识符和其他标识符,因此我已将原始块注释掉,并添加了一个新的块,只是为了向你展示如何获取想要的内容。

如果您想检索其他元素,例如当前被忽略的注释,则必须对它们执行相同的操作。这取决于您 :-)

现在,创建一个像这样的类以生成所有的存根

package mypackage;

public class Gen {

    public static void main(String[] args) {
        String[] arg0 = { "-visitor", "/home/leoks/EclipseIndigo/workspace2/SO/src/mypackage/Java.g4", "-package", "mypackage" };
        org.antlr.v4.Tool.main(arg0);
    }

}

运行Gen,你将在mypackage中得到一些Java代码。

现在创建一个访问者。实际上,在这个例子中,访问者将解析自身。

package mypackage;

import java.io.FileInputStream;
import java.io.IOException;

import mypackage.JavaParser.MyMethodNameContext;

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.ParseTreeWalker;

/**
 * @author Leonardo Kenji Feb 4, 2014
 */
public class MyVisitor extends JavaBaseVisitor<Void> {

    /**
     * Main Method
     * 
     * @param args
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        ANTLRInputStream input = new ANTLRInputStream(new FileInputStream("/home/leoks/EclipseIndigo/workspace2/SO/src/mypackage/MyVisitor.java")); // we'll
                                                                                                                                                    // parse
                                                                                                                                                    // this
                                                                                                                                                    // file
        JavaLexer lexer = new JavaLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        JavaParser parser = new JavaParser(tokens);
        ParseTree tree = parser.compilationUnit(); // see the grammar ->
                                                    // starting point for
                                                    // parsing a java file



        MyVisitor visitor = new MyVisitor(); // extends JavaBaseVisitor<Void>
                                                // and overrides the methods
                                                // you're interested
        visitor.visit(tree);
    }

    /**
     * some attribute comment
     */
    private String  someAttribute;

    @Override
    public Void visitMyMethodName(MyMethodNameContext ctx) {
        System.out.println("Method name:" + ctx.getText());
        return super.visitMyMethodName(ctx);
    }

}

就是这样了。

你会得到类似这样的东西。

Method name:main
Method name:visitMyMethodName

附言:还有一件事。当我在Eclipse中编写这段代码时,出现了一个奇怪的异常。这是由Java 7引起的,只需向编译器添加这些参数即可解决(感谢这个链接 http://java.dzone.com/articles/javalangverifyerror-expecting

输入图像描述


1
此外,您到底为什么要使用自己的“Gen”类? 使用Ant或Maven构建脚本之一即可。 - Sam Harwell
1
我会保留Gen类,因为只需要一行命令就能完成工作,而不必向新手解释ant或maven。 - Leo
我该如何安装antrl4 jar?当我尝试运行Gen.java时,它会给我一个错误。 - user3266901
请从这里获取 http://www.antlr.org/download/antlr-4.2-complete.jar 的下载链接。 - Leo
尝试使用ParseTree tree = parser.compilationUnit(); - Leo
显示剩余6条评论

2
grammar Criteria;

@parser::header {
  import java.util.regex.Pattern;
}

options
{
  superClass = ReferenceResolvingParser;
}

@parser::members {

  public CriteriaParser(TokenStream input, Object object) {
    this(input);
    setObject(object);
  }

}

/* Grammar rules */

reference returns [String value]
          : '$.' IDENTIFIER { $value = resolveReferenceValue($IDENTIFIER.text); }
          ;

operand returns [String value]
        : TRUE { $value = $TRUE.text; }
        | FALSE { $value = $FALSE.text; }
        | DECIMAL { $value = $DECIMAL.text; }
        | QUOTED_LITERAL  { $value = $QUOTED_LITERAL.text.substring(1, $QUOTED_LITERAL.text.length() - 1); }
        | reference { $value = $reference.value; }
        ;

operand_list returns [List value]
             @init{ $value = new ArrayList(); }
             : LBPAREN o=operand { $value.add($o.value); } (',' o=operand { $value.add($o.value); })* RBPAREN
             ;

comparison_expression returns [boolean value]
                      : lhs=operand NEQ rhs=operand { $value = !$lhs.value.equals($rhs.value); }
                      | lhs=operand EQ rhs=operand { $value = $lhs.value.equals($rhs.value); }
                      | lhs=operand GT rhs=operand { $value = $lhs.value.compareTo($rhs.value) > 0; }
                      | lhs=operand GE rhs=operand { $value = $lhs.value.compareTo($rhs.value) >= 0; }
                      | lhs=operand LT rhs=operand { $value = $lhs.value.compareTo($rhs.value) < 0; }
                      | lhs=operand LE rhs=operand { $value = $lhs.value.compareTo($rhs.value) <= 0; }
                      ;

in_expression returns [boolean value]
              : lhs=operand IN rhs=operand_list { $value = $rhs.value.contains($lhs.value); };

rlike_expression returns [boolean value]
                 : lhs=operand RLIKE rhs=QUOTED_LITERAL { $value = Pattern.compile($rhs.text.substring(1, $rhs.text.length() - 1)).matcher($lhs.value).matches(); }
                 ;

logical_expression returns [boolean value]
                   : c=comparison_expression { $value = $c.value; }
                   | i=in_expression { $value = $i.value; }
                   | l=rlike_expression { $value = $l.value; }
                   ;

chained_expression returns [boolean value]
                   : e=logical_expression { $value = $e.value; } (OR  c=chained_expression { $value |= $c.value; })?
                   | e=logical_expression { $value = $e.value; } (AND c=chained_expression { $value &= $c.value; })?
                   ;

grouped_expression returns [boolean value]
                   : LCPAREN c=chained_expression { $value = $c.value; } RCPAREN ;

expression returns [boolean value]
           : c=chained_expression { $value = $c.value; } (OR  e=expression { $value |= $e.value; })?
           | c=chained_expression { $value = $c.value; } (AND e=expression { $value &= $e.value; })?
           | g=grouped_expression { $value = $g.value; } (OR  e=expression { $value |= $e.value; })?
           | g=grouped_expression { $value = $g.value; } (AND e=expression { $value &= $e.value; })?
           ;

criteria returns [boolean value]
         : e=expression { $value = $e.value; }
         ;


/* Lexical rules */

AND : 'and' ;
OR  : 'or' ;

TRUE  : 'true' ;
FALSE : 'false' ;

EQ    : '=' ;
NEQ   : '<>' ;
GT    : '>' ;
GE    : '>=' ;
LT    : '<' ;
LE    : '<=' ;
IN    : 'in' ;
RLIKE : 'rlike' ;

LCPAREN : '(' ;
RCPAREN : ')' ;
LBPAREN : '[' ;
RBPAREN : ']' ;

DECIMAL : '-'?[0-9]+('.'[0-9]+)? ;

IDENTIFIER : [a-zA-Z_][a-zA-Z_.0-9]* ;

QUOTED_LITERAL :
                 (  '\''
                    ( ('\\' '\\') | ('\'' '\'') | ('\\' '\'') | ~('\'') )*
                 '\''  )
                ;

WS : [ \r\t\u000C\n]+ -> skip ;



public class CriteriaEvaluator extends CriteriaBaseListener
{

    static class CriteriaEvaluatorErrorListener extends BaseErrorListener
    {

        Optional<String> error = Optional.empty();

        @Override
        public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
            error = Optional.of(String.format("Failed to parse at line %d:%d due to %s", line, charPositionInLine + 1, msg));
        }

    }

    public static boolean evaluate(String input, Object argument)
    {
        CriteriaLexer lexer = new CriteriaLexer(new ANTLRInputStream(input));
        CriteriaParser parser = new CriteriaParser(new CommonTokenStream(lexer), argument);
        parser.removeErrorListeners();
        CriteriaEvaluatorErrorListener errorListener = new CriteriaEvaluatorErrorListener();
        lexer.removeErrorListeners();
        lexer.addErrorListener(errorListener);
        parser.removeErrorListeners();
        parser.addErrorListener(errorListener);
        CriteriaParser.CriteriaContext criteriaCtx = parser.criteria();
        if(errorListener.error.isPresent())
        {
            throw new IllegalArgumentException(errorListener.error.get());
        }
        else
        {
            return criteriaCtx.value;
        }
    }

}

3
欢迎来到Stack Overflow!虽然这段代码可能解决了问题,但包括解释真的有助于提高您的帖子质量。请记住,您正在为未来的读者回答问题,而这些人可能不知道您提出代码建议的原因。 - NathanOliver

-1

这里有一个详细的例子(借鉴自https://github.com/satnam-sandhu/ASTGenerator),我做了一些修改以获取行号。

helloworld.java

public class HelloWorld {
public static void main(String[] args) {
    System.out.println("Hello, World");
    }
}

JavaAstGeneratorDOT.java

import antlr.Java8Lexer;
import antlr.Java8Parser;

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.misc.Interval;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.ArrayList;


public class JavaAstGeneratorDOT {

    static ArrayList<String> LineNum = new ArrayList<String>();
    static ArrayList<String> Type = new ArrayList<String>();
    static ArrayList<String> Content = new ArrayList<String>();
    static ArrayList<String> RawLineNum = new ArrayList<String>();

    private static String readFile(String pathname) throws IOException {
        File file = new File(pathname);
        byte[] encoded = Files.readAllBytes(file.toPath());
        return new String(encoded, Charset.forName("UTF-8"));
    }

    public static void main(String args[]) throws IOException {
        String path = "helloworld.java";
        String inputString = readFile(path);
        ANTLRInputStream input = new ANTLRInputStream(inputString);
        Java8Lexer lexer = new Java8Lexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        Java8Parser parser = new Java8Parser(tokens);
        ParserRuleContext ctx = parser.compilationUnit();
//      ParserRuleContext ctx = parser.statementExpressionList();
//      ParserRuleContext ctx = parser.methodDeclaration();

        generateAST(ctx, false, 0, tokens);
        String filename = path.substring(path.lastIndexOf("\\") + 1, path.lastIndexOf("."));
        String save_dot_filename = String.format("ast_%s.dot", filename);
        PrintWriter writer = new PrintWriter(save_dot_filename);
        writer.println(String.format("digraph %s {", filename));
        printDOT(writer);
        writer.println("}");
        writer.close();
    }

    private static void generateAST(RuleContext ctx, boolean verbose, int indentation, CommonTokenStream tokens) {
        boolean toBeIgnored = !verbose && ctx.getChildCount() == 1 && ctx.getChild(0) instanceof ParserRuleContext;
        if (!toBeIgnored) {
            String ruleName = Java8Parser.ruleNames[ctx.getRuleIndex()];
            LineNum.add(Integer.toString(indentation));
            Type.add(ruleName);
            Content.add(ctx.getText());

            // get line number, added by tsmc.sumihui, 20190425
            Interval sourceInterval = ctx.getSourceInterval();
            Token firstToken = tokens.get(sourceInterval.a);
            int lineNum = firstToken.getLine();
            RawLineNum.add(Integer.toString(lineNum));
        }
        for (int i = 0; i < ctx.getChildCount(); i++) {
            ParseTree element = ctx.getChild(i);
            if (element instanceof RuleContext) {
                generateAST((RuleContext) element, verbose, indentation + (toBeIgnored ? 0 : 1), tokens);
            }
        }
    }

    private static void printDOT(PrintWriter writer) {
        printLabel(writer);
        int pos = 0;
        for (int i = 1; i < LineNum.size(); i++) {
            pos = getPos(Integer.parseInt(LineNum.get(i)) - 1, i);
            writer.println((Integer.parseInt(LineNum.get(i)) - 1) + Integer.toString(pos) + "->" + LineNum.get(i) + i);
        }
    }

    private static void printLabel(PrintWriter writer) {
        for (int i = 0; i < LineNum.size(); i++) {
//          writer.println(LineNum.get(i)+i+"[label=\""+Type.get(i)+"\\n "+Content.get(i)+" \"]");
            writer.println(LineNum.get(i) + i + "[label=\"" + Type.get(i) + "\", linenum=\"" + RawLineNum.get(i) + "\"]");
        }
    }

    private static int getPos(int n, int limit) {
        int pos = 0;
        for (int i = 0; i < limit; i++) {
            if (Integer.parseInt(LineNum.get(i)) == n) {
                pos = i;
            }
        }
        return pos;
    }
}

结果是这样的(ast_helloworld.dot):

digraph helloworld {
00[label="compilationUnit", linenum="1"]
11[label="normalClassDeclaration", linenum="1"]
22[label="classModifier", linenum="1"]
23[label="classBody", linenum="1"]
34[label="methodDeclaration", linenum="2"]
45[label="methodModifier", linenum="2"]
46[label="methodModifier", linenum="2"]
47[label="methodHeader", linenum="2"]
58[label="result", linenum="2"]
59[label="methodDeclarator", linenum="2"]
610[label="formalParameter", linenum="2"]
711[label="unannArrayType", linenum="2"]
812[label="unannClassType_lfno_unannClassOrInterfaceType", linenum="2"]
813[label="dims", linenum="2"]
714[label="variableDeclaratorId", linenum="2"]
415[label="block", linenum="2"]
516[label="expressionStatement", linenum="3"]
617[label="methodInvocation", linenum="3"]
718[label="typeName", linenum="3"]
819[label="packageOrTypeName", linenum="3"]
720[label="literal", linenum="3"]
00->11
11->22
11->23
23->34
34->45
34->46
34->47
47->58
47->59
59->610
610->711
711->812
711->813
610->714
34->415
415->516
516->617
617->718
718->819
617->720
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接