用堆栈实现LL(1)解析器: 如何构建AST？

Question

用堆栈实现LL(1)解析器: 如何构建AST？

parsingimplementationcontext-free-grammarll-grammar

8

我目前正在手动构建解析器。这是一个LL(1)解析器。目前，它是一个很好的识别器：它的函数parse(List tokens)决定tokens是否属于该语言。

现在，我想为该输入构建相应的AST。然而，我知道如何以递归下降的方式实现它（已经完成）。也就是说，在这个挑战中，我使用经典算法的堆栈来实现我的堆栈：

next <- first token of the input
stack <- START_SYMBOL
do {
    top <- stack.pop()
    if (top is a terminal and top == next) {
        next <- next token of the input
    } else if (top is a non terminal and PARSING_TABLE[top, next] exists) {
        stack.push(PARSING_TABLE[top, next]);
    } else {
         return invalid input;
    }
} while (stack is not empty);
return valid input;

其中 PARSING_TABLE 是 LL(1) 表。然而，我想知道如何在这种配置下实现构建 AST 的部分。我不是要求完整的实现，而是需要实现的思路。

谢谢！

- user1553136

2个回答

1

由于使用常见的方法——将栈上的非终结符替换为其匹配规则的右手边符号，会在预测时忘记语法结构而导致困难。但是生成抽象语法树需要在规则解析完成后保留该结构。

不要用匹配规则的rhs符号替换非终结符，而是保留它并将匹配的符号作为列表对象推入。这样，栈就保留了文法的层次结构。

解析过程消耗顶部列表中的符号。列表的用尽对应于规则的完成。非终结符在其规则完成后被从栈中移除，而不是在预测时移除。

当堆栈被消耗时，构建一个相关规则并存储已解析标记的副本AST结构。因此，堆栈就像一个流入解析AST的预测AST。

你可以将其视为使用符号列表堆栈作为调用帧堆栈来模拟递归下降解析器的调用层次结构。

在ruby中：

# g is the grammar; a list of rules
# s is a terminal sequence to parse
# note, this code does not tokenize EOF

def parse(g, s)

 tab = gen_table(g)
 stack = [[g.start_sym]]
 # intermediate ast node format: [rule-action, symbols...]
 ast = [[->(rhs){[:_S, rhs[0]]}]]

 loop do
  puts "PARSE\n #{s}\n #{stack}\n #{ast}"

  if stack.first.empty?
   raise "extraneous input" if not s.empty?
   break
  end

  if stack.last.empty? # rule complete
   stack.pop
   node = ast.pop
   # transform the node (eg to a class) using the associated rule-action
   node = node.first.(node.drop(1))
   ast.last.push(node)
   stack.last.shift # rm sym from stack after completing it
   next
  end

  raise "incomplete input" if s.empty?
  tok = s.first
  topsym = stack.last.first

  if topsym.is_a? String # terminal
   raise "mismatch #{tok} != #{topsym}" if tok != topsym
   stack.last.shift
   ast.last.push(s.shift)

  elsif topsym.is_a? Symbol # nonterminal
   ri = tab[topsym][tok]
   raise "no rule for #{topsym}, #{tok}" if ri.nil?
   stack.push(g[ri].rhs.clone)
   ast.push([g[ri].action])
  end

 end

 node = ast.first
 node.first.(node.drop(1))
end

- soliton_zero

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- armel · Accepted Answer

您的堆栈可以被注释，以便包含AST入口引用（即规则编号+规则中的位置+存储目标数据的位置）+（终端/非终端）。

您的initial stack <- START_SYMBOL被注释，以便将其结果存储在AST根节点中。

基本上，您的pop()选择当前的AST构造。然后next <- next token将该值保存在您的AST中。stack.push(PARSING_TABLE[top, next]);打开一个新的AST列表，并将其写入与top相应的构造中，并在堆栈的每个条目中生成“规则编号+位置+目标列表”。

当解析完成时，您就拥有了整个树。

在精确的AST中，您可能希望忽略某些标记。这可以通过在push()部分设置适当的堆栈注释来完成。典型的方法是为每个规则（A -> B C）附加一些元信息，例如要保留什么以及结果的性质。