混合语法解析器错误:中缀右结合运算符失败。

3

介绍

嗨,

在深入讨论之前,我要承认这是一个非常具体的问题。我已经尽可能地删除了不必要的信息,以确保清晰度。无论如何,进入正题:

我正在为Haskell中的mixfix表达式编写解析器。我找到了this paper,并想基于它构建第一版实现。

为了试图总结这篇论文,其思想是:我们构建一个DAG,代表一个优先级图。一个节点包含一组具有相同优先级的运算符。这些运算符可以是中缀(左/右/非关联)、前缀、后缀或闭合的。边指向更高优先级的节点。我已经从论文中截取了示例图的屏幕截图。

precedence-graph

如果一个中缀运算符有多个“插槽”(例如if_then_else_),那么它将表示为“名称部分”的列表(即[if,then,else]),而固定性仅涉及最左和最右的名称部分,因此if_then_else_是前缀,但Python三元if _if_else_是中缀。

然后使用此图来生成解析器组合器。参考文献,我正在使用以下库:

在进入代码之前,我要提到我已经意识到我的实现可能非常低效(特别是我自由使用回溯),一旦它正常工作,我会担心这个问题。

代码

我在解析器中使用两种类型作为“输出”:RawCore 和 Telescope。RawCore 表示一个输出表达式,可以是变量或柯里化函数应用。Telescope 只是一种更方便地表示重复应用的方式。

-- Technically RawCore is more complex than this, but this definition contains the relevant bits
data RawCore = Var Text | App RawCore RawCore

-- Telescope e [a, b] represents (App (App e a) b)
data Telescope = Tel RawCore [RawCore]

type Parser = Parsec Text Text

为了表示运算符和优先级图,我使用以下类型:

data Associativity = LeftAssociative | RightAssociative | NonAssociative
  deriving (Eq, Ord, Show)
  
data Fixity = Closed | Prefix | Postfix | Infix Associativity 
  deriving (Eq, Ord, Show)

data Operator = Operator { fixity :: Fixity, name_parts :: Vector Text }
  deriving (Eq, Ord, Show)

-- adapted from the paper to use the Topograph library
type PrecedenceNode = (Set Operator)

type PrecedenceGraph i = G PrecedenceNode i

在我开始实现之前,我会简要描述一些我使用的辅助函数(完整定义就在解析器定义下面):

  • 解析器辅助函数

    • <||> :回溯版本的<|>
    • choice' :回溯版本的choice
    • many1 :重复一个解析器1次或更多次
    • betweenM :最好通过示例来解释 - 取解析器向量(例如[symbol "a", symbol "b", symbol "c"])和单个解析器(例如int :: Parser Int),并返回一个解析器,该解析器将解析a 2 b 3 c并返回[2, 3]。如果向量仅包含单个解析器[p],则等效于p >> pure []。如果向量为空,则等效于pure []
    • symbol :以字符串作为输入(例如"if"),并生成一个解析器,该解析器解析该字符串后跟任何尾随空格。
  • 其他辅助函数

    • opName :取名称部分和结合性的向量,返回带有下划线的运算符名称,即将(Prefix,[if,then,else])转换为"if_then_else"
    • unscope:展开望远镜,将其转换为表达式(RawCore)。
-- Note: I'm using the scoped type variables extension, so the 'i' is captured
mixfix :: forall i. PrecedenceGraph i -> Parser RawCore
mixfix G {..} = expr
  where
    expr :: Parser RawCore
    expr = precs gVertices

    -- precs will take a list of nodes (lowest -> highest precedence) and return the
    -- first one which successfully parses the string.
    precs :: [i] -> Parser RawCore
    precs (p:ps) = prec p <||> precs ps
    precs [] = customFailure "ran out of operators in precedence graph" 

    -- prec will build a parser originating at a single precedence node
    prec :: i -> Parser RawCore
    prec node = choice
      [ try (unscope <$> close Closed)
      , try (appn <$> psucs <*> close (Infix NonAssociative) <*> psucs)
      , try (appr <$> many1 preRight <*> psucs)
      , try (appl <$> psucs <*> many1 postLeft)
      ]

      where
        close :: Fixity -> Parser Telescope
        close = inner . ops current_ops
          where
            current_ops :: [Operator]
            current_ops = Set.toList $ gFromVertex node

        psucs :: Parser RawCore
        psucs = precs $ gEdges node

        preRight :: Parser (RawCore -> RawCore)
        preRight = 
              (\(Tel core lst) val -> unscope $ Tel core (lst <> [val])) <$> close Prefix
          <||> (\l (Tel core lst) r -> unscope $ Tel core (l : lst <> [r]))
               <$> psucs <*> close (Infix RightAssociative)

        postLeft :: Parser (RawCore -> RawCore)
        postLeft =
              (\(Tel core lst) val -> unscope $ Tel core (val : lst)) <$> close Postfix
          <||> (\(Tel core lst) l r -> unscope $ Tel core (r : lst <> [l]))
               <$> close (Infix LeftAssociative) <*> psucs

        appn e (Tel core lst) e' = unscope $ Tel core (e : lst <> [e'])
        appr fs e = foldr (\f e -> f e) e fs
        appl e fs = foldl (\e f -> f e) e fs

    -- inner : for a given operator, return a parser which parses that operator with 
    --         only expressions between name bits, so, e.g.
    --         inner if_then_else_ would return a parser parsing expressions if e1 then e2 else
    --         inner _+_ would return a parser parsing the symbol '+'
    inner :: [Operator] -> Parser Telescope
    inner [] = customFailure "inner ran out of operators"
    inner (op : ops) =
      Tel (Var (opName op))
        <$> betweenM (fmap symbol $ name_parts op) expr
      <||> inner ops

    -- ops  : get all operators in a given node with a specified fixity
    --        also, get all operators of successor nodes
    ops :: [Operator] -> Fixity -> [Operator]
    ops op f = filter ((== f) . fixity) op


-- For the time being, I'm liberally applying 'try' at each alternative so
-- I don't have to think about whether it's causing the bug! (I know this is inefficient)
infixl 3 <||>
(<||>) :: Parser a -> Parser a -> Parser a
l <||> r = try l <|> r   

choice' :: [Parser a] -> Parser a
choice' = choice . fmap try

betweenM :: Vector (Parser b) -> Parser a -> Parser [a]  
betweenM vec p = case length vec of 
  0 -> pure []
  1 -> head vec *> pure []
  2 -> between (head vec) (last vec) ((\x -> [x]) <$> p)
  _ -> (head vec) *> ((:) <$> p <*> betweenM (tail vec) p)

many1 :: Parser a -> Parser [a]
many1 p = (:) <$> p <*> many p 

opName :: Operator -> Text
opName (Operator {..}) = case fixity of
  Closed -> name
  Prefix -> "_" <> name
  Postfix -> name <> "_"
  Infix _ -> "_" <> name <> "_"
  where name = underscore (Vector.toList name_parts)
        underscore [] = ""
        underscore [x] = x
        underscore (x:y:[]) = x <> "_" <> y
        underscore (x:y:xs) = x <> "_" <> y <> "_" <> underscore xs

unscope :: Telescope -> RawCore
unscope (Tel core l) = go core l where
  go :: RawCore -> [RawCore] -> RawCore
  go core [] = core 
  go core (c:cs) = go (App core c) cs

sc :: Parser () 
sc = L.space
  space1
  (L.skipLineComment ";;")
  (L.skipBlockComment "(;;" ";;)")

symbol :: Text -> Parser Text
symbol = L.symbol sc

问题

我已经为上面的代码编写了一些测试,并发现由于某种原因,右结合运算符没有解析。 我有以下优先级图:

_&_ right -> _=_ non -> _+_ left -> _! -> if_then_else_ -> (_)
                        _-_ left                           false
                                                           true                        

在代码中:

ops :: Map PrecedenceNode (Set PrecedenceNode)
ops = Map.fromList
  [ (node_and,   Set.fromList [node_eq])
  , (node_eq,    Set.fromList [node_arith, node_fact])
  , (node_arith, Set.fromList [node_close])
  , (node_fact,  Set.fromList [node_close])
  , (node_if,    Set.fromList [node_close])
  , (node_close, Set.fromList []) ]

  where 
    node_and   = Set.fromList [and_op]
    node_eq    = Set.fromList [eq_op]
    node_arith = Set.fromList [add_op, sub_op]
    node_fact  = Set.fromList [fact_op]
    node_if    = Set.fromList [if_op]
    node_close = Set.fromList [paren_op, true, false]
    
    true     = Operator Closed $ Vec.singleton "true"
    false    = Operator Closed $ Vec.singleton "false"
    and_op   = Operator (Infix RightAssociative) $ Vec.fromList ["&"]
    eq_op    = Operator (Infix NonAssociative)   $ Vec.fromList ["="]
    add_op   = Operator (Infix LeftAssociative)  $ Vec.fromList ["+"]
    sub_op   = Operator (Infix LeftAssociative)  $ Vec.fromList ["-"]
    fact_op  = Operator Postfix $ Vec.fromList ["!"]
    if_op    = Operator Prefix $ Vec.fromList ["if", "then", "else"]
    paren_op = Operator Closed $ Vec.fromList ["(", ")"]


在运行解析器时,我使用此图的传递闭包,即runG ops (parse . closure)。如果我提供字符串"true & false",它将被解析为简单的true。如果我将and_op定义更改为NonAssociativeLeftAssociative,那么解析器就可以正常工作(即返回App 'true' 'false')。我的问题是:为什么会这样,我该如何解决?
以下是一些示例字符串及其解析结果 - 我使用作为左结合二元运算符,而不是App,以使事情更清晰。
"true" (var "true")
"false" (var "false")
"( true )" ((var "(_)") ⋅ (var "true"))
"true = false" ((var "_=_") ⋅ (var "true" ) ⋅ (var "false"))
"true + false" ((var "_+_") ⋅ (var "true" ) ⋅ (var "false"))
"true + false - false" ((var "_-_") ⋅ ((var "_+_") ⋅ (var "true" ) ⋅ (var "false")) ⋅ (var "false"))

对于任何看到这里的陌生人 - 即使您不回答这个问题,我也想让您知道我感谢您花时间阅读我的胡言乱语 :)

1个回答

0

我找到了自己问题的答案。原来是一个辅助函数出了问题。我的对称选择运算符:

infixl 3 <||>
(<||>) :: Parser a -> Parser a -> Parser a
l <||> r = try l <|> r   

应该已经定义:

infixl 3 <||>
(<||>) :: Parser a -> Parser a -> Parser a
l <||> r = try l <|> try r   

我想,如果有什么教训需要学习的话,那就是:在使用解析器组合器时,要注意回溯


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接