我该如何进行记忆化？

Question

我该如何进行记忆化？

4

我写了一个计算Collatz序列的函数，但这个函数的执行时间会因为初始值不同而有很大的不同。显然这与"记忆化"有关，但我很难理解它是什么以及如何工作。不幸的是，HaskellWiki上相关文章以及它链接的论文都被证明不易克服。它们讨论高度晦涩难懂的树构造相对性能的细节，而我错过的可能只是这些源所忽略的一些非常基本、非常琐碎的点。

以下是代码，是一个完整的程序，可以直接编译和执行。

module Main where

import Data.Function
import Data.List (maximumBy)

size :: (Integral a) => a
size = 10 ^ 6

-- Nail the basics.

collatz :: Integral a => a -> a
collatz n | even n = n `div` 2
          | otherwise = n * 3 + 1

recollatz :: Integral a => a -> a
recollatz = fix $ \f x -> if (x /= 1) 
                          then f (collatz x)
                          else x

-- Now, I want to do the counting with a tuple monad.

mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)

remocollatz :: Integral a => a -> ([a], a)
remocollatz = fix $ \f x -> if x /= 1
                            then f =<< mocollatz x
                            else return x

-- Trivialities.

collatzLength :: Integral a => a -> Int
collatzLength x = (length . fst $ (remocollatz x)) + 1

collatzPairs :: Integral a => a -> [(a, Int)]
collatzPairs n = zip [1..n] (collatzLength <$> [1..n])

longestCollatz :: Integral a => a -> (a, Int)
longestCollatz n = maximumBy order $ collatzPairs n
  where
    order :: Ord b => (a, b) -> (a, b) -> Ordering
    order x y = snd x `compare` snd y

main :: IO ()
main = print $ longestCollatz size

使用 ghc -O2 编译后，获取任何小于 size 的起始点所得到的最长 Collatz 序列的长度和种子需要大约 17 秒左右，如果不使用 ghc -O2 则需要约 22 秒。

现在，如果我进行以下更改：

diff --git a/Main.hs b/Main.hs
index c78ad95..9607fe0 100644
--- a/Main.hs
+++ b/Main.hs
@@ -1,6 +1,7 @@
 module Main where

 import Data.Function
+import qualified Data.Map.Lazy as M
 import Data.List (maximumBy)

 size :: (Integral a) => a
@@ -22,10 +23,15 @@ recollatz = fix $ \f x -> if (x /= 1)
 mocollatz :: Integral b => b -> ([b], b)
 mocollatz n = ([n], collatz n)

-remocollatz :: Integral a => a -> ([a], a)
-remocollatz = fix $ \f x -> if x /= 1
-                            then f =<< mocollatz x
-                            else return x
+remocollatz :: (Num a, Integral b) => b -> ([b], a)
+remocollatz 1 = return 1
+remocollatz x = case M.lookup x (table mutate) of
+    Nothing -> mutate x
+    Just y  -> y
+  where mutate x = remocollatz =<< mocollatz x
+
+table :: (Ord a, Integral a) => (a -> b) -> M.Map a b
+table f = M.fromList [ (x, f x) | x <- [1..size] ]

 -- Trivialities.

--然后使用ghc -O2编译只需大约4秒钟，但如果没有ghc -O2，我活不到看到它完成。

--使用ghc -prof -fprof-auto -O2查看成本中心的详细信息，发现第一个版本进入collatz大约一亿次，而修补后的版本只有大约一百五十万次。这一定是加速的原因，但我很难理解这个神奇内部工作方式。我的最佳想法是，我们用O(log n) map查找替换了一部分昂贵的递归调用，但我不知道它是否正确以及为什么它如此依赖某些可怜的编译器标志，而在我看来，这样的性能波动应该完全由语言决定。

--我可以解释一下这里发生了什么，以及为什么ghc -O2和普通的ghc构建之间的性能差异如此巨大吗？

--附言：Stack Overflow的其他地方强调了实现自动记忆化的两个要求：

--将要记忆化的函数制作为顶级名称。 --将要记忆化的函数制作为单态化函数。

--根据这些要求，我重新构建了remocollatz，如下所示：

remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = mutate x

mutate :: Int -> ([Int], Int)
mutate x = remocollatz =<< mocollatz x

现在它已经像顶层和单态化一样了。运行时间大约为11秒，而类似的单态化table版本：

remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = case M.lookup x (table mutate) of
    Nothing -> mutate x
    Just y  -> y

mutate :: Int -> ([Int], Int)
mutate = \x -> remocollatz =<< mocollatz x

table :: (Int -> ([Int], Int)) -> M.Map Int ([Int], Int)
table f = M.fromList [ (x, f x) | x <- [1..size] ]

-- 在不到4秒的时间内运行。我不明白为什么在这里表现良好的ghc记忆化功能比我的愚蠢表格慢了近3倍。

- Ignat Insarov

对不起，如果这篇文章太长或写得不好，请见谅。我会努力改善我的写作风格。 - Ignat Insarov

1

看起来你的问题不仅仅是理解Haskell或优化的重要性，而且还涉及到记忆化的概念（更广泛的编程概念），对吗？你是否查阅过一般的计算机科学、维基百科或书籍等来源，以了解记忆化的解释？ - Thomas M. DuBuisson

1

我曾经在Haskell中问过一个关于记忆化的问题，也许那个答案对你也有所帮助。https://stackoverflow.com/questions/11473130/memoization-pascals-triangle - Viktor Mellgren

你可以提供解释，但请为所有顶层函数添加类型签名。 - leftaroundabout

1

@Kindaro，你的问题涉及太多内容了，请集中精力解决一个问题，而不是一大堆。优化很重要，这是毋庸置疑的。我建议你单独提出关于优化的问题。记忆化斐波那契数列很容易实现，因为你总是知道输入参数 n 小于当前调用的每个数字。记忆化考拉兹猜想使用映射（通常）而不是列表，因为对考拉兹猜想的迭代调用不仅仅是 n-1 和 n-2，而是一种相当不可预测的形式。最后，不要认为记忆化斐波那契数列就是魔法。如果操作不清楚，首先要理解它们。 - Thomas M. DuBuisson

显示剩余2条评论

2个回答

2

另一种记忆化的方法适用于某些情况，比如这个例子，就是使用一个装箱向量，其元素是惰性计算的。用于初始化每个元素的函数可以在其计算中使用向量的其他元素。只要向量的元素评估不会循环并引用自身，只会评估它递归依赖的元素。一旦评估完成，元素就被有效地缓存了，这还有另外一个好处，即从未引用的向量元素永远不会被评估。

对于这种技术，Collatz序列是一个几乎理想的应用，但存在一个复杂性。从小于限制值的值开始的下一个Collatz值可能超出限制，这将导致在索引向量时出现范围错误。我通过迭代序列直到回到限制之下并计算所需步骤来解决了这个问题。

以下程序未经优化时运行时间为0.77秒，经过优化后为0.30秒：

import qualified Data.Vector as V

limit = 10 ^ 6 :: Int

-- The Collatz function, which given a value returns the next in the sequence.

nextCollatz val
  | odd val = 3 * val + 1
  | otherwise = val `div` 2

-- Given a value, return the next Collatz value in the sequence that is less
-- than the limit and the number of steps to get there. For example, the
-- sequence starting at 13 is: [13, 40, 20, 10, 5, 16, 8, 4, 2, 1], so if
-- limit is 100, then (nextCollatzWithinLimit 13) is (40, 1), but if limit is
-- 15, then (nextCollatzWithinLimit 13) is (10, 3).

nextCollatzWithinLimit val = (firstInRange, stepsToFirstInRange)
  where
    firstInRange = head rest
    stepsToFirstInRange = 1 + (length biggerThanLimit)
    (biggerThanLimit, rest) = span (>= limit) (tail collatzSeqStartingWithVal)
    collatzSeqStartingWithVal = iterate nextCollatz val

-- A boxed vector holding Collatz length for each index. The collatzFn used
-- to generate the value for each element refers back to other elements of
-- this vector, but since the vector elements are only evaluated as needed and
-- there aren't any loops in the Collatz sequences, the values are calculated
-- only as needed.

collatzVec :: V.Vector Int
collatzVec = V.generate limit collatzFn
  where
    collatzFn :: Int -> Int
    collatzFn index
      | index <= 1 = 1
      | otherwise = (collatzVec V.! nextWithinLimit) + stepsToGetThere
      where
        (nextWithinLimit, stepsToGetThere) = nextCollatzWithinLimit index

main :: IO ()
main = do

  -- Use a fold through the vector to find the longest Collatz sequence under
  -- the limit, and keep track of both the maximum length and the initial
  -- value of the sequence, which is the index.

  let (maxLength, maxIndex) = V.ifoldl' accMaxLen (0, 0) collatzVec
      accMaxLen acc@(accMaxLen, accMaxIndex) index currLen
        | currLen <= accMaxLen = acc
        | otherwise = (currLen, index)
  putStrLn $ "Max Collatz length below " ++ show limit ++ " is "
             ++ show maxLength ++ " at index " ++ show maxIndex

- Truman

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Daniel Wagner · Accepted Answer

我能否解释一下这里发生了什么，以及为什么 ghc -O2 和普通的 ghc 构建之间的性能差异如此之大？

免责声明：这只是一个猜测，并没有通过查看 GHC 核心输出进行验证。一个仔细的答案会这样做来验证下面概述的猜想。您可以尝试自己查看：将 -ddump-simpl 添加到编译行中，您将获得详细的输出，详细说明 GHC 对您的代码所做的操作。

你写道：

remocollatz x = {- ... -} table mutate {- ... -}
  where mutate x = remocollatz =<< mocollatz x

实际上，表达式table mutate并不依赖于x。但是它出现在一个方程的右边，这个方程以x作为参数。因此，如果没有优化，每次调用remocollatz时都会重新计算这个表（甚至可能在计算table mutate中从内部进行调用）。

通过优化，GHC注意到table mutate不依赖于x，并将其浮动到自己的定义中，从而有效地产生了：

fresh_variable_name = table mutate
  where mutate x = remocollatz =<< mocollatz x

remocollatz x = case M.lookup x fresh_variable_name of
    {- ... -}

因此程序运行期间只计算一次。

我不知道为什么性能会如此依赖某些可怜的编译器标志，我认为这样的性能波动应该完全由语言本身决定。

抱歉，但是Haskell并不是这样工作的。语言定义清楚地说明了给定Haskell术语的含义，但没有说明计算该含义所需的运行时或内存性能。