在Scheme(R6RS)中表示代数数据类型构造函数的惯用方式是什么?

8

我想知道将类Haskell的数据类型翻译为Scheme的最佳方法是什么。我的当前计划是使用vector来表示构造器,第一个元素是代表变体的label。例如,下面是一个Haskell程序:

data Bits       = O Bits | I Bits | E deriving Show
data Nat        = S Nat  | Z          deriving Show
inc (O pred)    = I pred
inc (I pred)    = O (inc pred)
inc E           = E
dup (S pred)    = let (x,y) = dup pred in (S x, S y)
dup Z           = (Z, Z)
bus Z        bs = inc bs
bus (S pred) bs = let (x,y) = (pred,pred) in (bus pred (bus pred bs))
o32             = (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O E))))))))))))))))))))))))))))))))
n26             = (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S Z))))))))))))))))))))))))))
main            = print (bus n26 o32)

Would be translated as:

(define (O pred)   (vector 'O pred))
(define (I pred)   (vector 'I pred))
(define E          (vector 'E))
(define (S pred)   (vector 'S pred))
(define Z          (vector 'Z))
(define (Inc bits) (case (vector-ref bits 0) ('O (I (vector-ref bits 1))) ('I (O (Inc (vector-ref bits 1)))) ('E E)))
(define (Dup val)  (case (vector-ref val 0) ('S (let ((xy (Dup (vector-ref val 1)))) (cons (S (car xy)) (S (cdr xy))))) ('Z (cons Z Z))))
(define (Bus n bs) (case (vector-ref n 0) ('Z (Inc bs)) ('S (let ((xy (Dup (vector-ref n 1)))) (Bus (car xy) (Bus (cdr xy) bs))))))
(define O32        (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O E)))))))))))))))))))))))))))))))))
(define N26        (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S Z)))))))))))))))))))))))))))
(display (Bus N26 O32))

令我惊讶的是,这个方法表现得非常好(Scheme在这里比Haskell更快)。但我想知道这是否是最好的方法?这种做法合理吗,还是有更加“惯用”的翻译方式,能够表现得更好吗?

1
是的,标记向量是惯用语,例如在EOPL第一版(1992年)中使用。 - Will Ness
1个回答

4
总体而言,我认为您有两种方法:一种是“正向”编码,就像您在这里所做的那样,在其中使用向量表示包含(动态)标签的产品来区分和,另一种是“负向”编码(Böhm–Berarducci / visitor),其中您通过递归方案来表示数据类型以消耗它们。
以下是Haskell中BB编码版本的示例:
{-# Language RankNTypes #-}

-- The type of /folds/ that consume bits.
newtype Bits = Bits
  { matchBits
    :: forall r.  -- Caller-specified result type.
       (r -> r)   -- O: 1 recursive field
    -> (r -> r)   -- I: 1 recursive field
    -> r          -- E: no fields; could be ‘() -> r’
    -> r
  }

-- The type of one /recursive unrolling/ of bits.
newtype Bits' = Bits'
  { matchBits'
    :: forall r.    -- Also any result type.
       (Bits -> r)  -- But note! ‘Bits’ instead of ‘r’.
    -> (Bits -> r)
    -> r
    -> r
  }

-- Basic constructors retain their types.
mkI, mkO :: Bits -> Bits
mkE :: Bits

mkI', mkO' :: Bits' -> Bits'
mkE' :: Bits'

-- Constructor functions perform visitor dispatch.
-- This is where the recursion happens in ‘matchBits’.
mkO pred = Bits $ \  o  i e -> o (matchBits pred o i e)
mkI pred = Bits $ \  o  i e -> i (matchBits pred o i e)
mkE      = Bits $ \ _o _i e -> e

-- General recursive dispatch is similar.
mkO' pred = Bits' $ \  o  i e -> o (matchBits' pred mkO mkI mkE)
mkI' pred = Bits' $ \  o  i e -> i (matchBits' pred mkO mkI mkE)
mkE'      = Bits' $ \ _o _i e -> e

-- Recursive deconstruction, used below.
recurBits :: Bits -> Bits'
recurBits bits = matchBits bits mkO' mkI' mkE'

-- We only need a fold for nats here.
newtype Nat = Nat
  { matchNat
    :: forall r.  -- Result type.
       (r -> r)   -- S: 1 recursive field
    -> r          -- Z: no fields; also could be ‘() -> r’
    -> r
  }

mkS :: Nat -> Nat
mkZ :: Nat

mkS pred = Nat $ \  s z -> s (matchNat pred s z)
mkZ      = Nat $ \ _s z -> z

-- Case branches with ‘matchBits’ receive the /result/
-- of the recursive call on a recursive field. So this
-- is /not/ what we want:
--
-- > inc bits = matchBits bits mkI (mkO . inc) mkE
--
-- Instead, we want the field itself, so we must use
-- the recursive ‘matchBits'’.
inc :: Bits -> Bits
inc bits = matchBits' (recurBits bits) mkI (mkO . inc) mkE

-- Or: ‘dup nat = matchNat nat (mkS *** mkS) (mkZ, mkZ)’
-- Or: ‘dup nat = (nat, nat)’ = ‘dup = join (,)’
dup :: Nat -> (Nat, Nat)
dup nat = matchNat nat
  (\ (x, y) -> (mkS x, mkS y))  -- S
  (mkZ, mkZ)                    -- Z

-- NB: think of as ‘Nat -> (Bits -> Bits)’.
bus :: Nat -> Bits -> Bits
bus n = matchNat n
  (\ f -> f . f)  -- S
  inc             -- Z

你可以将其直接翻译成Scheme。这里是一个未经测试且可能不正确的草图,仅用于说明起点:

(define (O pred)  (lambda (o i e) (o (pred o i e))))
(define (I pred)  (lambda (o i e) (i (pred o i e))))
(define E         (lambda (o i e) (e)))

(define (O_ pred) (lambda (o i e) (o (pred O I E))))
(define (I_ pred) (lambda (o i e) (i (pred O I E))))
(define E_        (lambda (o i e) (e)))

(define (S pred)  (lambda (s z) (s (pred s z))))
(define Z         (lambda (s z) (z)))

(define (recurBits bits) (bits O_ I_ E_))

(define (Inc bits)
  ((recurBits bits)
     I
     (lambda (pred) (O (Inc pred)))
     E))

(define (Dup val)
  (val (lambda (p)
         (let ((x (car p))
               (y (cdr p)))
           (cons (S x) (S y))))
       (cons Z Z)))

(define (Bus n bs)
  ((Bus_ n) bs))
(define (Bus_ n)
  (n (lambda (pred) (lambda (bs) (pred (pred bs))))
     inc))

你可能需要在一些地方添加显式参数或额外的lambda函数来处理部分应用和惰性求值的差异,比如Bus_的不便。

总的来说,在许多应用中,我预计这种方法具有可比较或更好的性能特征。它不是依赖向量,而是依赖闭包,因为语言实现更了解它们的结构,所以可以更好地编译。它选择函数(进行尾调用)而不是动态分派值,从而避免了一些值的构造。

当学习Haskell中这种技术时,我也发现Oleg Kiselyov的BB编码笔记非常有帮助。


1
消除器在编译或JIT时避免跳转到未知代码的情况下,通常能够很好地工作。否则,与标签的情况分析相比,它们往往表现较差。 - dfeuer
@dfeuer:我很想看看在不同的非/优化Scheme实现中对此进行基准测试,但我预计结果会是相同的:如果编译器发现可以使用直接调用,则可以打开内联和删除分支以及其他所有操作,但如果无法使用,则“直接”样式几乎肯定更好。 - Jon Purdy
1
哦,有趣的是,我实际上正在执行整个编译过程,以避免编译到访问者模式,因为默认情况下Kind上的每种数据类型都是λ编码的(这导致了你所描述的情况)。虽然使用这种模式,Scheme表现得稍微差一些(比如说,2倍),但这仍然是一个很好的答案,很高兴在这里看到它! - MaiaVictor

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接