在Haskell中创建独特的标签

9
我是一名有用的助手,可以为您翻译文本。
我正在使用Haskell编写一个简单命令式语言的编译器,输出Java字节码。我已经完成了发出字节码的抽象表示的步骤。
在编写if语句的编译代码时,遇到了一些问题。为了实现if语句,我需要跳转到标签。因此,我需要为该标签生成一个名称,并且该名称需要是唯一的。
我的第一个想法是通过“编译语句”中传递一些状态,即
compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]

当然,compilerStatement是递归的,所以使用此方法需要将唯一ID生成器的状态从递归调用中传回:
compileStatement :: Statement -> UniqueIDState -> (UniqueIdState, [AbstractInstruction])

这似乎有些笨拙,特别是如果我意识到将来需要携带更多状态时; 有更优雅的方法吗?

8
线程状态是State Monad所发明的类型。 - dave4420
4
你刚刚发明了状态单子(State Monad) :-) - sigfpe
2个回答

7
你需要一份“唯一的供应品”。在Haskell中通常的做法是通过State单子来传递计数器,这样可以自动化你所描述的管道问题。

3
在haskell.org的维基上有一个例子,链接为http://www.haskell.org/haskellwiki/New_monads/MonadSupply。 - hammar
这解决了我的问题,State monad(以及一般的monad)看起来相当优雅。不过我使用的是mtl1而不是当前的mtl2。 - Viktor Dahl
"mtl 2.*" 现在是标准。它几乎具有相同的接口。 - Don Stewart

5

如果你手头唯一的工具是锤子,那么你可能会把所有东西都当成钉子。

亚伯拉罕·马斯洛。

那么我们来考虑一些不同的东西——一个非 Monad 类型的独特供应。恰好情况如此,您的原始类型签名已经接近了:

    compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]

如果唯一的要求是每个标签都是唯一的 - 不需要计算使用了多少个,只要在相同情况下给出相同的标识符即可 - 那么您可以使用一种不太侵入性的技术。
无IO的可分割供应品 中,Luke Palmer 展示了如何将 价值供应品 进行封装。
    runSupply :: (forall a . Eq a => Supply a -> b) -> b

这样可以避免单子类型IO污染使用它们的程序的大部分: 不错! 但这并不是唯一的问题 - 根据它们如何定义,你需要正确地使用这些供应品。例如,假设:
    data Statement =
        ... | If Statement Statement Statement | ...

然后如果:

    compileStatement (If c t e) s =
        case split s of
          s1 : s2 : s3 : _ -> buildCondJump (compileStatement c s1)
                                            (compileStatement t s2)
                                            (compileStatement e s3)

被错误地更改为:

    compileStatement (If c t e) s =
        case split s of
          s1 : s2 : s3 : _ -> buildCondJump (compileStatement c s)
                                            (compileStatement t s)
                                            (compileStatement e s)

不仅UniqueSupplyUnique的值被错误地重复使用,如果compileStatement的任何递归调用密集使用了supply,则存在空间泄漏的风险。


Clean不同,Haskell没有标准的方法将类型标记为单态(monousal)。这就只能在运行时进行检查:绝对是一个抽象数据类型的工作!
有个想法——如果该ADT也是可分割的(splittable),我们就可以使用它来定义一个替代的值供应类型。如果一切顺利,这种新类型的值将具有两个属性:可分割(splittable)和单态(monousal)。
查看Data.Supply可以发现使用了二叉树类型,该模块和定义似乎基于功能珠玑 [On generating unique names] 中的以下示例,由Lennart Augustsson、Mikael Rittri和Dan Synek编写-第4页共7页。
    module HideGensym(
        Name, NameSupply, initialNameSupply, getNameDeplete, splitNameSupply)
    where

        gensym :: a -> Int  -- implemented in assembler

        data Name = MkName Int deriving (Eq)
        data NameSupply = MkNameSupply Name NameSupply NameSupply

        initialNameSupply = gen ()
            where gen x = MkNameSupply (MkName (gensym x)) (gen x) (gen x)
        getNameDeplete (MkNameSupply n s1 _) = (n, s1)
        splitNameSupply (MkNameSupply _ s1 s2) = (s1, s2)

...gensym:我们暂且不管它。让我们看看如何将新的 ADT 插入到 NameSupply 中……在我们处理更加平凡的事情之前:- 两个相同的对 gen 的调用 - MkNameSupply ... (gen x) (gen x) - 对于一个优化的 Haskell 实现,它们是相同的值:

         -- same function, same argument, same result: what's the matter? 
        initialNameSupply = gen ()
            where gen x = let s = gen x in
                          MkNameSupply (MkName (gensym x)) s s

也许我们可以一举解决这两个问题:

        initialNameSupply = runUO gen
            where gen u = let !(u1, u2) = splitUO2 u in
                          MkNameSupply (MkName (gensym ())) (gen u1) (gen u2)

这里 UO 将成为我们的新的一次性拆分准备好的抽象数据类型:

    module UO(
        UO, initUO, splitUO, splitUO2, ...
    ) where

    data UO s ...
    runUO    :: (forall s . UO s -> a) -> a
    splitUO  :: UO s -> [UO s]
    splitUO2 :: UO s -> (UO s, UO s)   
               ⋮

...也可以被封装。

(毫无疑问,在英语中一定有比 spiltable splittable 更好的词语...)


现在让我们来谈谈关于“gensym”问题——让我们从第4页的“HideGensym”模块开始,先发出这样一个警告:

必须使用汇编语言编写gensym[物品],可能还要编写gen函数。

否则,那个单独调用gensym的命令可能会全部移除:还记得gen()吗?

         {- WRONG! -}
        initialNameSupply = runUO gen
            where gen u = let !(u1, u2) = splitUO2 u in
                          MkNameSupply (MkName x) (gen u1) (gen u2)
        x = gensym ()
 

由于gensym(据说!)接受任何类型的输入:

        gensym :: a -> Int  -- implemented in assembler

这不应该破坏任何东西:

        initialNameSupply = runUO gen
            where gen u = let !(u1:u2:u3:_) = splitUO u in
                          MkNameSupply (MkName (gensym u1)) (gen u2) (gen u3)

作为额外的奖励,我们可以制作一个稍微更通用的initialNameSupply版本:
        initialNameSupply = initialSupply gensym

        initialSupply :: (UO s -> Int) -> NameSupply
        initialSupply g = runUO gen
            where gen u = let !(u1:u2:u3:_) = splitUO u in
                          MkNameSupply (MkName (g u1)) (gen u2) (gen u3)

好的,所以gensym还在那里-至少现在它被隔离了。


到目前为止,您可能已经注意到另一个示例模块OneTimeSupplies,它有自己的警告语:

仅当每个供应品最多使用一次时,它才是引用透明的。

此外,在第3页的后面:

如果程序的编译时分析可以保证每个名称提供只被使用一次,无论是执行getNameDeplete还是splitNameSupply,则树就变得不必要了[...]

由于我们依赖UO来提供相同的保证,那么我们是否也可以在实现中去掉树并节省一些工作呢?
为了做到这一点,supplyValuesplit将需要升级:
  • the simplest option for supplyValue is to provide it with the generator (g in initialSupply).

      data NameSupply = forall s . Supply (UO s -> Int) ...
    
      supplyValue :: NameSupply -> Name 
      supplyValue (Supply g ...) = MkName (g ...)
    
  • as for the splits, they require an UO value so they can obtain the new UO values needed by the new supplies:

      data NameSupply = forall s . Supply (UO s) ...
    
      split  :: NameSupply -> [NameSupply]
      split  (Supply u ...) = [ Supply v ... |  v <- splitUO u ]
    
      split2 :: NameSupply -> (NameSupply, NameSupply)
      split2 (Supply u ...) = let !(u1, u2) = splitUO2 u in
                              (Supply u1 ..., Supply u2 ...)
    

那明显表明:

    data NameSupply = forall s . Supply (UO s -> Int) (UO s)

    supplyValue (Supply g u) = MkName (g u)

    split  (Supply g u) = [ Supply g v |  v <- splitUO u ]

    split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
                          (Supply g u1, Supply g u2)

但是它也适用于initialNameSupply吗?

    initialNameSupply = initialSupply gensym

    initialSupply :: (UO s -> Int) -> NameSupply
    initialSupply = runUO . Supply

事情会变得更好:

    type NameSupply = Supply Name
    data Name = MkName Int deriving (Eq)
    initialNameSupply = initialSupply (MkName . gensym)

     --    NameSupply    --
     -- ================ --
     --      Supply      --

    data Supply a = forall s . Supply (UO s -> a) (UO s)      

    instance Functor Supply where
        fmap f (Supply g u) = Supply (f . g) u
    
    supplyValue :: Supply a -> a
    supplyValue (Supply g u) = g u

    split  :: Supply a -> [Supply a]
    split  (Supply g u) = [ Supply g v |  v <- splitUO u ]

    split2 :: Supply a -> (Supply a, Supply a)
    split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
                          (Supply g u1, Supply g u2)

    initialSupply :: (UO s -> a) -> NameSupply
    initialSupply = runUO . Supply

这非常有前途,如果可以按照预期定义UO和其关联对象...


如果您已经阅读了Luke Palmer的帖子,那么您已经知道他使用一个丑陋的不安全的实体来定义runSupply。现在(2022年1月),runST以类似的方式定义
    runST :: (forall s. ST s a) -> a
    runST (ST st_rep) = case runRW# st_rep of (# _, a #) -> a

在哪里:

    newtype ST s a = ST (STRep s a)
    type STRep s a = State# s -> (# State# s, a #
    runRW# :: STRep RealWorld a -> (# State# RealWorld, a #)

不使用这些方法能定义 UO 吗?这可能值得单独回答 - 现在,我们只能容忍这种丑陋:

    {-# LANGUAGE BangPatterns, RankNTypes, UnboxedTuples, MagicHash #-}
    module UO(
        UO, runUO, splitUO, splitUO2,
        useUO, asUO,
    ) where

    import Prelude   (String, Eq(..))
    import Prelude   ((.), ($), (++), error, all)
    import Data.Char (isSpace)
    import GHC.Base  (State#, MutVar#)
    import GHC.Base  (runRW#, newMutVar#, noDuplicate#)
    import GHC.Exts  (atomicModifyMutVar#)
    import GHC.ST    (ST(..), STRep)

    data UO s               =  UO (UO# s)

    runUO                   :: (forall s . UO s -> a) -> a
    runUO g                 =  let (# _, r #) = runRW# (useUO# (g . UO)) in r

    splitUO                 :: UO s -> [UO s]
    splitUO u               =  let !(u1, u2) = splitUO2 u in u1 : splitUO u

    splitUO2                :: UO s -> (UO s, UO s)
    splitUO2 (UO h)         =  let (# h1, h2 #) = splitUO2# h in (UO h1, UO h2)

    useUO                   :: (UO s -> a) -> ST s a
    useUO g                 =  ST (\s -> useUO# (g . UO) s)

    asUO                    :: Eq a => String -> ST s a -> UO s -> a
    asUO name (ST act) (UO h)
                            =  asUO# name act h

     -- local definitions --
    type UO# s              =  String -> State# s

    splitUO2#               :: UO# s -> (# UO# s, UO# s #)
    splitUO2# h             =  let !s           = h "splitUO2"
                                   (# s', h1 #) = dispense# s
                                   (# _,  h2 #) = dispense# s'
                               in  (# h1, h2 #)

    useUO#                  :: (UO# s -> a) -> STRep s a
    useUO# g s              =  let (# s', h #) = dispense# s
                                   !r           = g h
                               in  (# s', r #)

    dispense#               :: STRep s (UO# s)
    dispense# s             =  let (# s', r #) = newMutVar# () s
                               in  (# s', expire# s' r #)

    expire#                 :: State# s -> MutVar# s () -> String -> State# s
    expire# s r name        =  let (# s', () #) = atomicModifyMutVar# r use s
                               in  s'
                               where
                                   use x   =  (error nowUsed, x)
                                   nowUsed =  name' ++ ": already expired"
                                   name'   =  if all isSpace name then "(unknown)"
                                              else name

    asUO#                   :: Eq a => String -> STRep s a -> UO# s -> a
    asUO# name act h        =  let (# _, t #) = act (noDuplicate# (h name)) in t

这可能比严格必要的复杂一点(例如,基本的重用错误报告),但作为交换,UO 基于定义现在可以操作本地状态...


还有一个在Data.Supply中需要实现的定义:

    newSupply :: a -> (a -> a) -> IO (Supply a)
    newSupply start next = gen =<< newIORef start
      where gen r = unsafeInterleaveIO
                  $ do v  <- unsafeInterleaveIO (atomicModifyIORef r upd)
                       ls <- gen r
                       rs <- gen r
                       return (Node v ls rs)
            upd a = let b = next a in seq b (b, a)

这将结束对 gensym 的需求。它与 initialSupply 有些相似 - 能否让这更明显?

  • gen in the original initialNameSupply doesn't have a reference parameter r:

        newSupply start next = do r <- newIORef start
                                  let gen = unsafeInterleaveIO $
                                            do v  <- unsafeInterleaveIO (atomicModifyIORef r upd)
                                               ls <- gen
                                               rs <- gen
                                               return (Node v ls rs)
                                  gen
          where upd a = let b = next a in seq b (b, a)
    
  • the value-action unsafeInterleaveIO (atomicModifyIORef r upd) performs the role of gensym in the original initialNameSupply:

        newSupply start next = do r <- newIORef start
                                  let gen    = unsafeInterleaveIO $
                                               do v  <- genval
                                                  ls <- gen
                                                  rs <- gen
                                                  return (Node v ls rs)
                                      genval = unsafeInterleaveIO (atomicModifyIORef r upd)
                                  gen
          where upd a = let b = next a in seq b (b, a)
    
  • gen in the original initialNameSupply had no need of do-notation:

        newSupply start next = do r <- newIORef start
                                  let gen    = unsafeInterleaveIO (liftM3 Node genval gen gen)
                                      genval = unsafeInterleaveIO (atomicModifyIORef r upd)
                                  gen
          where upd a = let b = next a in seq b (b, a)
    
  • does genval have to be in that let-binding?

        newSupply start next = do r <- newIORef start
                                  let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen)
                                  gen
          where genval r = unsafeInterleaveIO (atomicModifyIORef r upd)
                upd a    = let b = next a in seq b (b, a)
    
  • upd is only used in genval:

        newSupply start next = do r <- newIORef start
                                  let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen)
                                  gen
          where genval r = let upd a = let b = next a in seq b (b, a)
                           in unsafeInterleaveIO (atomicModifyIORef r upd)
    
  • can some content in genval be moved to a separate definition?

        newSupply start next = do r <- newIORef start
                                  let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen)
                                  gen
          where genval r = unsafeInterleaveIO (nextValue r next)
    
        nextValue :: IORef a -> (a -> a) -> IO a
        nextValue r next = let upd a = let b = next a in seq b (b, a)
                           in atomicModifyIORef r upd
    

现在它更清晰地类似于原始的initialNameSupply,使用我们的新Supply类型重新实现newSupply相对简单 - 首先,需要改变单子类型:

    newSupply start next = do r <- newSTRef start
                              let gen = unsafeInterleaveST (liftM3 Node (genval r) gen gen)
                              gen
      where genval r = unsafeInterleaveST (nextValue r next)

    nextValue :: STRef s a -> (a -> a) -> ST s a
    nextValue r next = let upd a = let b = next a in seq b (b, a)
                       in atomicModifyST r upd

nextValue 不需要进行其他更改。至于 newSupply

    newSupply :: Eq a => a -> (a -> a) -> ST s (Supply a)
    newSupply start next = do r <- newSTRef start
                              let g = asUO "genval" (genval r)
                              useUO (Supply g)
      where genval r = nextValue r next

我们可以使用这个函数来定义我们的版本 runSupply:

    runSupply :: (forall a . Eq a => Supply a -> b) -> b
    runSupply f = f (runST (newSupply (0 :: Int) succ))

我们现在能否从NameSupply类型中最终删除gensym

    initialNameSupply :: NameSupply
    initialNameSupply = fmap MkName (initialSupply 0 succ)

    initialSupply :: Eq a => a -> (a -> a) -> Supply a
    initialSupply start next = runST (newSupply start next)

是的。


这里是所有相关定义,按模块排列:

  • ExpelGensym, the replacement for HideGensym on page 4 of 7:

      {-# LANGUAGE BangPatterns #-}
      module ExpelGensym(
          Name, NameSupply, initialNameSupply, getNameDeplete, splitNameSupply
      ) where
      import Control.Monad    (liftM)
      import Control.Monad.ST (runST)
      import Supply           (Supply, newSupply, supplyValue, split2)
    
      data Name = MkName Int deriving (Eq)
      type NameSupply = Supply Name
    
      initialNameSupply :: Supply Name
      initialNameSupply = fmap MkName (initialSupply 0 succ)
    
      getNameDeplete :: NameSupply -> (Name, NameSupply)    
      getNameDeplete s = let !(s1, s2) = split2 s in (supplyValue s1, s2)
    
      splitNameSupply :: NameSupply -> (NameSupply, NameSupply)
      splitNameSupply = split2
    
       -- local definitions --
      initialSupply :: Eq a => a -> (a -> a) -> Supply a
      initialSupply start next = runST (newSupply start next)
    
  • Supply, our miniature implementation of Data.Supply:

      {-# LANGUAGE BangPatterns, ExistentialQuantification, RankNTypes #-}
      module Supply(
          Supply, newSupply, runSupply, supplyValue, split, split2
      ) where
      import Control.Monad.ST
      import Data.STRef
      import UO
    
      data Supply a = forall s . Supply (UO s -> a) (UO s)      
    
      instance Functor Supply where
          fmap f (Supply g u) = Supply (f . g) u
    
      newSupply :: Eq a => a -> (a -> a) -> ST s (Supply a)
      newSupply start next = do r <- newSTRef start
                                let g = asUO "genval" (genval r)
                                useUO (Supply g)
        where genval r = nextValue r next
    
      runSupply :: (forall a . Eq a => Supply a -> b) -> b
      runSupply f = f (runST (newSupply (0 :: Int) succ))
    
      supplyValue :: Supply a -> a
      supplyValue (Supply g u) = g u
    
      split  :: Supply a -> [Supply a]
      split  (Supply g u) = [ Supply g v |  v <- splitUO u ]
    
      split2 :: Supply a -> (Supply a, Supply a)
      split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
                            (Supply g u1, Supply g u2)
    
       -- local definitions --
      nextValue :: STRef s a -> (a -> a) -> ST s a
      nextValue r next = let upd a = let b = next a in seq b (b, a)
                         in atomicModifySTRef r upd
    
      {-
       -- if your Haskell installation doesn't define it --
      atomicModifySTRef :: STRef s a -> (a -> (a, b)) -> ST s b
      atomicModifySTRef r f = do x <- readSTRef r
                                 let !(x', y) = f x
                                 writeSTRef r x'
                                 return y
      -}
    
  • UO, that use-once split-ready abstract-data type:

      {-# LANGUAGE BangPatterns, RankNTypes, UnboxedTuples, MagicHash #-}
      module UO(
          UO, runUO, splitUO, splitUO2,
          useUO, asUO,
      ) where
    
      import Prelude   (String, Eq(..))
      import Prelude   ((.), ($), (++), error, all)
      import Data.Char (isSpace)
      import GHC.Base  (State#, MutVar#)
      import GHC.Base  (runRW#, newMutVar#, noDuplicate#)
      import GHC.Exts  (atomicModifyMutVar#)
      import GHC.ST    (ST(..), STRep)
    
      data UO s               =  UO (UO# s)
    
      runUO                   :: (forall s . UO s -> a) -> a
      runUO g                 =  let (# _, r #) = runRW# (useUO# (g . UO)) in r
    
      splitUO                 :: UO s -> [UO s]
      splitUO u               =  let !(u1, u2) = splitUO2 u in u1 : splitUO u
    
      splitUO2                :: UO s -> (UO s, UO s)
      splitUO2 (UO h)         =  let (# h1, h2 #) = splitUO2# h in (UO h1, UO h2)
    
      useUO                   :: (UO s -> a) -> ST s a
      useUO g                 =  ST (\s -> useUO# (g . UO) s)
    
      asUO                    :: Eq a => String -> ST s a -> UO s -> a
      asUO name (ST act) (UO h)
                              =  asUO# name act h
    
       -- local definitions --
      type UO# s              =  String -> State# s
    
      splitUO2#               :: UO# s -> (# UO# s, UO# s #)
      splitUO2# h             =  let !s           = h "splitUO2"
                                     (# s', h1 #) = dispense# s
                                     (# _,  h2 #) = dispense# s'
                                 in  (# h1, h2 #)
    
      useUO#                  :: (UO# s -> a) -> STRep s a
      useUO# g s              =  let (# s', h #) = dispense# s
                                     !r           = g h
                                 in  (# s', r #)
    
      dispense#               :: STRep s (UO# s)
      dispense# s             =  let (# s', r #) = newMutVar# () s
                                 in  (# s', expire# s' r #)
    
      expire#                 :: State# s -> MutVar# s () -> String -> State# s
      expire# s r name        =  let (# s', () #) = atomicModifyMutVar# r use s
                                 in  s'
                                 where
                                     use x   =  (error nowUsed, x)
                                     nowUsed =  name' ++ ": already expired"
                                     name'   =  if all isSpace name then "(unknown)"
                                                else name
    
      asUO#                   :: Eq a => String -> STRep s a -> UO# s -> a
      asUO# name act h        =  let (# _, t #) = act (noDuplicate# (h name)) in t
    

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接