如何使用Haskell解析中缀表达式而不是前缀表达式?

3
我需要帮助写Haskell程序。我已经写了大部分内容,基本上我想要做的是:
当我在终端中输入 parse "a + b"
时,我希望得到以下输出:
Plus (Word "a") (Word "b")
当我在终端中输入 parse "a - 2 * b + c"
时,我希望输出为:
Minus (Word "a") (Plus (Mult (Num 2) (Word "b")) (Word "c"))
我的代码如下:
data Ast
    = Word String
    | Num Int
    | Mult Ast Ast
    | Plus Ast Ast
    | Minus Ast Ast
    deriving (Eq, Show)

tokenize :: [Char] -> [String]
tokenize [] = []
tokenize (' ' : s) = tokenize s
tokenize ('+' : s) = "+" : tokenize s
tokenize ('*' : s) = "*" : tokenize s
tokenize (c : s)
  | isDigit c =
    let (cs, s') = collectWhile isDigit s
     in (c : cs) : tokenize s'
  | isAlpha c =
    let (cs, s') = collectWhile isAlpha s
     in (c : cs) : tokenize s'
  | otherwise = error ("unexpected character " ++ show c)

collectWhile :: (Char -> Bool) -> String -> (String, String)
collectWhile p s = (takeWhile p s, dropWhile p s)

isDigit, isAlpha :: Char -> Bool
isDigit c = c `elem` ['0' .. '9']
isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']

parseU :: [String] -> (Ast, [String])
parseU ("+" : s0) =
  let (e1, s1) = parseU s0
      (e2, s2) = parseU s1
   in (Plus e1 e2, s2)
parseU ("*" : s0) =
  let (e1, s1) = parseU s0
      (e2, s2) = parseU s1
   in (Mult e1 e2, s2)
parseU (t : ts)
  | isNumToken t = (Num (read t), ts)
  | isWordToken t = (Word t, ts)
  | otherwise = error ("unrecognized token " ++ show t)
parseU [] = error "unexpected end of input"

isNumToken, isWordToken :: String -> Bool
isNumToken xs = takeWhile isDigit xs == xs
isWordToken xs = takeWhile isAlpha xs == xs

parse :: String -> Ast
parse s =
  case parseU (tokenize s) of
    (e, []) -> e
    (_, t : _) -> error ("unexpected token " ++ show t)

inn :: Ast -> String
inn (Plus x y) = innP x ++ " + " ++ innP y
inn (Mult x y) = innP x ++ " * " ++ innP y
inn ast = innP ast

innP :: Ast -> String
innP (Num n) = show n
innP (Plus x y) = "(" ++ innP x ++ " + " ++ innP y ++ ")"
innP (Mult x y) = "(" ++ innP x ++ " * " ++ innP y ++ ")"
innP (Word w) = w -- 

innfiks :: String -> String
innfiks s = inn (parse s)

目前我在终端中输入文本时遇到错误,但是如果我这样写:

parse "+ a b"

我会得到正确的输出:

Plus (Word "a") (Word "b")

我知道我必须更改代码,使其接受我以以下格式发送给解析函数的内容:

value operator value,

而不是这种形式:

operator value value

但我很难找到需要进行更改的内容和位置。


2
你手头的是一个波兰式解析器。关于如何进一步解析中缀表达式,有很多相关资料可供参考。 - leftaroundabout
1个回答

3
为了处理具有优先级的中缀运算符,一种方法是引入一系列与优先级水平对应的解析函数。如果您拥有可以相乘以创建“项”的“因子”,这些“项”可以相加或相减以创建“表达式”,则需要为每个级别创建解析器函数。解析“因子”(即单词或数字)很容易,因为您已经编写了代码:
parseFactor :: [String] -> (Ast, [String])
parseFactor (t : ts)
  | isNumToken t = (Num (read t), ts)
  | isWordToken t = (Word t, ts)
  | otherwise = error ("unrecognized token " ++ show t)
parseFactor [] = error "unexpected end of input"

解析一个表达式更为复杂。您需要从解析一个因子开始,然后可选地解析*和另一个因子,然后将其作为要进一步乘以另一个因子的项处理,以此类推。以下是一种实现方式:

parseTerm :: [String] -> (Ast, [String])
parseTerm ts
  =  let (f1, ts1) = parseFactor ts     -- parse first factor
     in  go f1 ts1
  where go acc ("*":ts2)                -- add a factor to an accumulating term
          = let (f2, ts3) = parseFactor ts2
            in go (Mult acc f2) ts3
        go acc rest = (acc, rest)       -- no more factors: return the term

如果您想尝试编写类似的parseExpr来解析由+字符分隔的项(暂时跳过减法),可以在以下内容上进行测试:
parseExpr (tokenize "2 + 3 * 6 + 4 * 8 * 12 + 1")

如果需要剧透,这里提供一种既可以处理+也可以处理-的版本。但请注意,你的标记器目前还无法正确处理减法,因此你需要先修复它。

parseExpr :: [String] -> (Ast, [String])
parseExpr ts
  =  let (f1, ts1) = parseTerm ts
     in  go f1 ts1
  where go acc (op:ts2)
          | op == "+" || op == "-"
          = let (f2, ts3) = parseTerm ts2
            in go ((astOp op) acc f2) ts3
        go acc rest = (acc, rest)
        astOp "+" = Plus
        astOp "-" = Minus

有了这个设置,您可以将 parse 指向正确的解析器:

parse :: String -> Ast
parse s =
  case parseExpr (tokenize s) of
    (e, []) -> e
    (_, t : _) -> error ("unexpected token " ++ show t)

你的示例应该可以工作:

λ> parse "a - 2 * b + c"
Plus (Minus (Word "a") (Mult (Num 2) (Word "b"))) (Word "c")

请注意,这与你所说的输出略有不同,但对于左结合运算符(对于正确处理 - 很重要)来说,这种顺序是正确的。也就是说,你需要:
5 - 4 + 1

要解析为:

(5 - 4) + 1  -- i.e., (Plus (Minus (Num 5) (Num 4)) (Num 1))

为了让评估器计算出正确答案2。如果您将其解析为:
5 - (4 + 1)  -- i.e., (Minus (Num 5) (Plus (Num 4) (Num 1)))

如果您的评估器使用左结合操作符,那么它将计算错误的答案0。

然而,如果您确实想要解析右结合操作符,请参见以下内容。

左结合操作符的完整修改代码:

data Ast
    = Word String
    | Num Int
    | Mult Ast Ast
    | Plus Ast Ast
    | Minus Ast Ast
    deriving (Eq, Show)

tokenize :: [Char] -> [String]
tokenize [] = []
tokenize (' ' : s) = tokenize s
tokenize ('-' : s) = "-" : tokenize s
tokenize ('+' : s) = "+" : tokenize s
tokenize ('*' : s) = "*" : tokenize s
tokenize (c : s)
  | isDigit c =
    let (cs, s') = collectWhile isDigit s
     in (c : cs) : tokenize s'
  | isAlpha c =
    let (cs, s') = collectWhile isAlpha s
     in (c : cs) : tokenize s'
  | otherwise = error ("unexpected character " ++ show c)

collectWhile :: (Char -> Bool) -> String -> (String, String)
collectWhile p s = (takeWhile p s, dropWhile p s)

isDigit, isAlpha :: Char -> Bool
isDigit c = c `elem` ['0' .. '9']
isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']

parseFactor :: [String] -> (Ast, [String])
parseFactor (t : ts)
  | isNumToken t = (Num (read t), ts)
  | isWordToken t = (Word t, ts)
  | otherwise = error ("unrecognized token " ++ show t)
parseFactor [] = error "unexpected end of input"

parseTerm :: [String] -> (Ast, [String])
parseTerm ts
  =  let (f1, ts1) = parseFactor ts
     in  go f1 ts1
  where go acc ("*":ts2)
          = let (f2, ts3) = parseFactor ts2
            in go (Mult acc f2) ts3
        go acc rest = (acc, rest)

parseExpr :: [String] -> (Ast, [String])
parseExpr ts
  =  let (f1, ts1) = parseTerm ts
     in  go f1 ts1
  where go acc (op:ts2)
          | op == "+" || op == "-"
          = let (f2, ts3) = parseTerm ts2
            in go ((astOp op) acc f2) ts3
        go acc rest = (acc, rest)
        astOp "+" = Plus
        astOp "-" = Minus

isNumToken, isWordToken :: String -> Bool
isNumToken xs = takeWhile isDigit xs == xs
isWordToken xs = takeWhile isAlpha xs == xs

parse :: String -> Ast
parse s =
  case parseExpr (tokenize s) of
    (e, []) -> e
    (_, t : _) -> error ("unexpected token " ++ show t)

对于右结合的运算符,修改这些定义:
parseTerm :: [String] -> (Ast, [String])
parseTerm ts
  =  let (fct, ts1) = parseFactor ts
     in  case ts1 of
           "*":ts2 -> let (trm, rest) = parseTerm ts2
                      in  (Mult fct trm, rest)
           _       -> (fct, ts1)

parseExpr :: [String] -> (Ast, [String])
parseExpr ts
  =  let (trm, ts1) = parseTerm ts
     in  case ts1 of
           op:ts2 | op == "+" || op == "-"
                   -> let (expr, rest) = parseExpr ts2
                      in  (astOp op trm expr, rest)
           _       -> (trm, ts1)
  where astOp "+" = Plus
        astOp "-" = Minus*

正如你在回答中所说的,加号打印在外面,减号打印在里面。有没有办法改变这个顺序?我尝试了多种方法,但似乎最后一个表达式总是先出现.. - Markiy99
1
我已经更新了我的答案,展示了如何做到这一点。但请注意,如果您想要解析数学表达式的解析器“正常方式”,实际上您需要在外部使用加号,在内部使用减号,就像我之前试图解释的那样。 - K. A. Buhr
非常感谢 :-) - Markiy99

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接