PyParsing中的简单递归下降解析

18

我尝试着将这段代码转换为我正在进行的编程语言处理项目所需的形式,但是在一个简化版本中,我遇到了问题:

op = oneOf( '+ - / *')
lparen, rparen = Literal('('), Literal(')')

expr = Forward()
expr << ( Word(nums) | ( expr + op + expr ) | ( lparen + expr + rparen) )

我已经对这个简单的设置进行了许多不同的修改尝试,通常会尝试这样做:

print(expr.parseString('1+2'))

将返回['1']。而如果像这样陷入深度递归,我将会被卡住:

print(expr.parseString('(1+2)'))

相对于简单的递归,我错过了什么导致我无法解析任意算术表达式,比如1+(2 * 3-(4*(5+6)-(7))...

4个回答

27

哇,我想pyparsing确实很有名!感谢Alex和John对这个问题的介入。你们两个的回答都很到位。但是让我补充一下:

  1. If we suppress the opening and closing parenthesis symbols, and group the parenthesized expression using Group, pyparsing will a structured result that is closer to an AST.

    from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf,Group
    
    def Syntax():
        op = oneOf('+ -')
        lpar  = Literal( '(' ).suppress()
        rpar  = Literal( ')' ).suppress()
        num = Word(nums)
        expr = Forward()
        atom = num | Group(lpar + expr + rpar)
        expr << atom + ZeroOrMore(op + atom)
        return expr
    
    if __name__ == "__main__":
        expr = Syntax()
        def test(s):
            results = expr.parseString(s)
            print s,'->', results
    
        test( "(9 + 3)" )
        test( "(9 + 3) * (4 / 5)" )
    

    Giving:

    (9 + 3) -> [['9', '+', '3']]
    (9 + 3) * (4 / 5) -> [['9', '+', '3'], '*', ['4', '/', '5']]
    

    Otherwise, pyparsing is just tokenizing, and you have to walk the list of parsed tokens to find the nested expressions.

  2. Since op is defined as just oneOf("+ - * /"), there is no precedence of operations. There are examples on the pyparsing repo at https://github.com/pyparsing/pyparsing/tree/master/examples of the manual way to define this (fourFn.py), or the more recent approach using the infixNotation helper (simpleArith.py). Again, this has pyparsing adding more value than just tokenizing.

对于楼主,请查看这些例子,我认为它们将有助于推进您的项目。

-- 保罗


9

这符合您的要求吗?

from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf

def Syntax():
    op = oneOf( '+ - / *')
    lpar  = Literal( '(' )
    rpar  = Literal( ')' )
    num = Word(nums)

    expr = Forward()
    atom = num | ( lpar + expr + rpar )
    expr << atom + ZeroOrMore( op + expr )
    return expr


if __name__ == "__main__":

    expr = Syntax()

    def test(s):
        results = expr.parseString( s )
        print s,'->', results

    test( "(9 + 3)" )
    test( "(9 + 3) * (4 / 5)" )

发射
(9 + 3) -> ['(', '9', '+', '3', ')']
(9 + 3) * (4 / 5) -> ['(', '9', '+', '3', ')', '*', '(', '4', '/', '5', ')']

? 这个符号将一个“原子”(数字或括号表达式)与一个“表达式”(一组带有运算符的“原子”)分开,从而“锚定”递归。


4

一种类似于以下语法的语言:

expr :: expr op expr

这段代码很难处理,因为递归会不断地向左深入。

一个普通的算术语法应该长这样:

expr :: mulxp | mulxp '+' expr
mulxp :: atom | atom '*' expr
atom :: Word(nums) | '(' + expr + ')'

基本上,你永远不会得到S :: S;在语法中,如果一个非终结符出现在左右两侧的行中,必须有某个文字在中间供解析器消耗。


请问您能否提供一些关于如何将 expr :: expr op expr 转换为 Pyparsing 可以处理的其他形式的提示?例如,在我的情况下,参考 https://dev59.com/hInis4cB2Jgan1znAYW_。 - Nordlöw

0
使用operatorPrecedence来构建表达式。它将构建正确的表达式,并在此过程中处理运算符优先级:
num = Word(nums)
plusop = oneOf( '+ -')
multop = oneOf('/ *')
expr = operatorPrecedence(num,
                          [(multop, 2, opAssoc.LEFT),(plusop, 2, opAssoc.LEFT)])

例子:

>> print parsetime.expr.parseString("1+(2 * 3-(4*(5+6)-(7)))")
[['1', '+', [['2', '*', '3'], '-', [['4', '*', ['5', '+', '6']], '-', '7']]]]

Pyparsing不再托管在wikispaces.com上。请访问https://github.com/pyparsing/pyparsing。 - PaulMcG

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接