Haskell：生成组合技术的比较

Question

Haskell：生成组合技术的比较

5

我之前做了一些“99 Haskell 问题”的练习，其中我觉得第27个问题（“编写一个函数来枚举可能的组合”）很有意思，因为它是一个简单的概念，并且适用于多个实现方式。

我对相对效率感到好奇，所以我决定运行几种不同的实现——结果如下表所示。（参考信息：在VirtualBox上运行的LXDE（Ubuntu 14.04）中的Emacs bash ansi-term；Thinkpad X220；8GB RAM，i5 64位2.4GHz。）

(i) 为什么组合生成技术#7和#8（下表中的代码包含在帖子底部）比其他方法快这么多？

(ii) 此外，“字节”列中的数字实际上代表什么？

(i) 奇怪的是，函数#7通过过滤幂集（比组合列表要大得多）来工作；我认为这是惰性的结果，即这是最有效地利用我们只要求列表长度而不是列表本身的函数。（此外，它的“内存使用量”比其他函数低，但是，我也不确定显示的确切与内存相关的统计数据是什么。）

关于函数#8：向Bergi致敬，因为他实现了那个极快的实现，也感谢user5402提出了建议。仍在努力理解这一个的速度差异。

(ii) “字节”列中的数字是在运行:set +s命令后GHCi报告的；显然它们并不代表最大内存使用量，因为我只有约25GB的RAM和免费的硬盘空间。

import Data.List

--algorithms to generate combinations
--time required to compute the following: length $ 13 "abcdefghijklmnopqrstuvwxyz"

--(90.14 secs, 33598933424 bytes)
combDC1 :: (Eq a) => Int -> [a] -> [[a]]
combDC1 n xs = filter (/= []) $ combHelper n n xs []

combHelper :: Int -> Int -> [a] -> [a] -> [[a]]
combHelper n _ []        chosen           = if length chosen == n
                                               then [chosen]
                                               else [[]]
combHelper n i remaining chosen
  | length chosen == n = [chosen]
  | n - length chosen > length remaining = [[]]                     
  | otherwise = combHelper n (i-1) (tail remaining) ((head remaining):chosen) ++
                combHelper n i (tail remaining) chosen

--(167.63 secs, 62756587760 bytes)
combSoln1 :: Int -> [a] -> [([a],[a])]
combSoln1 0 xs = [([],xs)]
combSoln1 n [] = []
combSoln1 n (x:xs) = ts ++ ds
  where
    ts = [ (x:ys,zs) | (ys,zs) <- combSoln1 (n-1) xs ]
    ds = [ (ys,x:zs) | (ys,zs) <- combSoln1 n     xs ]

--(71.40 secs,  30480652480 bytes)
combSoln2 :: Int -> [a] -> [[a]]
combSoln2 0 _  = [ [] ]
combSoln2 n xs = [ y:ys | y:xs' <- tails xs
                           , ys <- combSoln2 (n-1) xs']

--(83.75 secs, 46168207528 bytes)
combSoln3 :: Int -> [a] -> [[a]]
combSoln3 0 _  = return []
combSoln3 n xs = do
  y:xs' <- tails xs
  ys <- combSoln3 (n-1) xs'
  return (y:ys)

--(92.34 secs, 40541644232 bytes)
combSoln4 :: Int -> [a] -> [[a]]
combSoln4 0 _ = [[]]
combSoln4 n xs = [ xs !! i : x | i <- [0..(length xs)-1] 
                               , x <- combSoln4 (n-1) (drop (i+1) xs) ]

--(90.63 secs, 33058536696 bytes)
combSoln5 :: Int -> [a] -> [[a]]
combSoln5 _ [] = [[]]
combSoln5 0 _  = [[]]
combSoln5 k (x:xs) = x_start ++ others
    where x_start = [ x : rest | rest <- combSoln5 (k-1) xs ]
          others  = if k <= length xs then combSoln5 k xs else []


--(61.74 secs, 33053297832 bytes)                                                                         
combSoln6 :: Int -> [a] -> [[a]]
combSoln6 0 _ = [[]]
combSoln6 _ [] = []
combSoln6 n (x:xs) = (map (x:) (combSoln6 (n-1) xs)) ++ (combSoln6 n xs)


--(8.41 secs, 10785499208 bytes)
combSoln7 k ns = filter ((k==).length) (subsequences ns)


--(3.15 secs, 2889815872 bytes)
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
                          in if n>l then [] else subsequencesBySize xs !! (l-n)
 where
   subsequencesBySize [] = [[[]]]
   subsequencesBySize (x:xs) = let next = subsequencesBySize xs
                               in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])

- iceman

1

非常简短，几乎肯定不完整的答案是：关键在于，7和8都记忆了所有先前的结果——而其他方法会多次重新计算——请注意它们全部都有多个递归调用；并且足够懒惰——而其他方法可能会强制使用具有长度等属性的“中间”结果。 - user2407038

7不够快，试试这个：combSoln7 2 [1..30]；看起来很快，因为26!/(13! * 13!) = 10400600，这只比2^26 = 67108864小6倍。此外，6可能比8更快，请参见http://ideone.com/TdCWiK和http://ideone.com/ojnrB3（但我不知道ideone.com进行了哪些优化）。 - effectfully

也许你想把这个算法加入你的测试中。它能够像上面最好的解决方案一样快速地解决C（26,13），但结果是按字典顺序排列的。然而，当对C(100,5)进行测试时，它比其他解决方案更快且更省内存。 - Redu

1

不幸的是，这种比较并不是那么有用，因为性能关键代码通常会使用优化进行编译。然而，GHCi并没有这样做，因此实际性能将会变化很大。 - zabeltech

2个回答

1

fact :: (Integral a) => a -> a
fact n = product [1..n]

ncombs n k = -- to evaluate number of combinations
    let n' = toInteger n
        k' = toInteger k
    in div (fact n') ((fact k') * (fact (n' - k')))

combinations :: Int -> [a] -> [[a]]
combinations 0 xs = [[]]
combinations 1 xs = [[x] | x <- xs]
combinations n xs =
    let ps = reverse [0..n - 1]
        inc (p:[])
            | pn < length xs = pn:[]
            | otherwise = p:[]
            where pn = p + 1
        inc (p:ps)
            | pn < length xs = pn:ps
            | (head psn) < length xs = inc ((head psn):psn)
            | otherwise = (p:ps)
            where pn = p + 1
                  psn = inc ps
        amount = ncombs (length xs) n
        pointers = take (fromInteger amount) (iterate inc ps)
        c' xs ps = map (xs!!) (reverse ps)
    in map (c' xs) pointers

我正在学习Haskell，并找到了一个相对快速的实现。在某些函数需要整数、分数和整数的情况下，我遇到了类型系统的困难。在我的电脑上，这里提供的最快解决方案需要大约6.1秒才能运行，而我的解决方案只需要3.5到2.9秒。

- jogo3000

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ErikR · Accepted Answer

你还应该测试在这个SO答案中找到的算法：

从列表中提取长度为n的子序列的性能（英文链接）。

subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
                          in if n>l then [] else subsequencesBySize xs !! (l-n)
 where
   subsequencesBySize [] = [[[]]]
   subsequencesBySize (x:xs) = let next = subsequencesBySize xs
                             in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])

在我的电脑上，我从ghci得到了以下的时间和内存使用情况：

ghci> length $ combSoln7   13 "abcdefghijklmnopqrstuvwxyz"
10400600
(13.42 secs, 10783921008 bytes)

ghci> length $ subsequencesOfSize  13 "abcdefghijklmnopqrstuvwxyz"
10400600
(6.52 secs, 2889807480 bytes)