欧拉计划第14题 Haskell解法

Question

欧拉计划第14题 Haskell解法

4

我正在尝试解决欧拉计划的第14个问题（http://projecteuler.net/problem=14），在使用Haskell时遇到了瓶颈。

虽然我知道数字可能足够小，我可以采用蛮力方法，但这不是我的目的。我试图在类型为Map Integer (Bool, Integer)的Map中记忆中间结果，其含义是：

- the first Integer (the key) holds the number
- the Tuple (Bool, Interger) holds either (True, Length) or (False, Number) 
                                           where Length = length of the chain
                                                 Number = the number before him

示例：

  for 13: the chain is 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
  My map should contain : 
  13 - (True, 10)
  40 - (False, 13)
  20 - (False, 40)
  10 - (False, 20)
  5  - (False, 10)
  16 - (False, 5)
  8  - (False, 16)
  4  - (False, 8)
  2  - (False, 4)
  1  - (False, 2)

现在，当我搜索另一个数字如40时，我知道链的长度为(10-1) length等等。现在，如果我搜索10，不仅要告诉我长度为(10-3) length并更新地图，还要在它们仍然是(False, _)的情况下更新20、40。

我的代码：

import Data.Map as Map

solve :: [Integer] -> Map Integer (Bool, Integer)
solve xs    = solve' xs Map.empty
    where
        solve' :: [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
        solve' []     table = table
        solve' (x:xs) table =
            case Map.lookup x table of
                Nothing     -> countF x 1 (x:xs) table
                Just     (b, _) ->
                    case b of
                        True    -> solve' xs table
                        False   -> {-WRONG-} solve' xs table

        f :: Integer -> Integer
        f x
            | x `mod` 2 == 0    = x `quot` 2
            | otherwise     = 3 * x + 1

        countF :: Integer -> Integer -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
        countF n cnt (x:xs) table
            | n == 1    = solve' xs (Map.insert x (True, cnt) table)
            | otherwise = countF (f n) (cnt + 1) (x:xs) $ checkMap (f n) n table

        checkMap :: Integer -> Integer -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
        checkMap n rez table    =
            case Map.lookup n table of
                Nothing -> Map.insert n (False, rez) table
                Just _  -> table

在 {-WRONG-} 部分，我们应该更新所有值，就像以下示例一样：

--We are looking for 10:
  10 - (False, 20)
     |
     V                                   {-finally-} update 10 => (True, 10 - 1 - 1 - 1)
  20 - (False, 40)                                      ^
     |                                                  |
     V                                  update 20 => 20 - (True, 10 - 1 - 1)
  40 - (False, 13)                          ^
     |                                      |
     V                      update 40 => 40 - (True, 10 - 1)
  13 - (True, 10)              ^
     |                         |
     ---------------------------

问题是我不知道在一个函数中是否可以做两个事情，比如更新一个数字和继续递归。在类似 C 的语言中，我可能会这样做（伪代码）：

void f(int n, tuple(b,nr), int &length, table)
{
      if(b == False) f (nr, (table lookup nr), 0, table);
      // the bool is true so we got a length
      else
      {
            length = nr;
            return;
      }
      // Since this is a recurence it would work as a stack, producing the right output
      table update(n, --cnt);
}

最后一条指令将起作用，因为我们通过引用发送了cnt。此外，我们始终知道它会在某个点结束，且cnt不应小于1。

- Thanatos

1

你的映射值类型最好使用 Either Integer Integer 更符合惯用语，但我认为如果未缓存条目没有映射，那么不设置映射会更方便。 - David Eisenstat

如果程序知道需要更新什么，为什么不将更新后的表传递给循环呢？ - גלעד ברקן

@groovy 因为在调用时（当我处于递归中）我不知道应该更新什么值。我必须先进行多次递归，然后从底部向上更新它们，才能得到答案。以10为例，我必须先访问20、40、13，然后才能知道要将40、20、10更新为什么值。 - Thanatos

7个回答

3

你正在以一种困难的方式进行记忆化，试图在Haskell中编写命令式程序。借鉴David Eisenstat的解决方案，我们将按照j_random_hacker的建议来解决：

collatzLength :: Integer -> Integer
collatzLength n
    | n == 1 = 1
    | even n = 1 + collatzLength (n `div` 2)
    | otherwise = 1 + collatzLength (3*n + 1)

这个问题的动态规划解决方案是用查找表替换递归。让我们创建一个函数，可以替换递归调用：

collatzLengthDef :: (Integer -> Integer) -> Integer -> Integer
collatzLengthDef r n
    | n == 1 = 1
    | even n = 1 + r (n `div` 2)
    | otherwise = 1 + r (3*n + 1)

现在我们可以定义递归算法如下：

collatzLength :: Integer -> Integer
collatzLength = collatzLengthDef collatzLength

现在我们还可以制作一个表格版本（它需要一个数字作为表格大小，并返回使用该大小的表格计算出的collatzLength函数）：

-- A utility function that makes memoizing things easier
buildTable :: (Ix i) => (i, i) -> (i -> e) -> Array i e
buildTable bounds f = array $ map (\x -> (x, f x)) $ range bounds

collatzLengthTabled :: Integer -> Integer -> Integer
collatzLengthTabled n = collatzLengthTableLookup
    where
        bounds = (1, n)
        table = buildTable bounds (collatzLengthDef collatzLengthTableLookup)
        collatzLengthTableLookup =
            \x -> Case inRange bounds x of
                True -> table ! x
                _ -> (collatzLengthDef collatzLengthTableLookup) x

这是通过定义collatzLength为表格查找实现的，该表格是函数的定义，但递归调用被表格查找替换。表格查找函数检查函数参数是否在表格范围内，并返回函数的定义。我们甚至可以将其应用于类似于此的任何函数：

tableRange :: (Ix a) => (a, a) -> ((a -> b) -> a -> b) -> a -> b
tableRange bounds definition = tableLookup
    where
        table = buildTable bounds (definition tableLookup)
        tableLookup =
            \x -> Case inRange bounds x of
                True -> table ! x
                _ -> (definition tableLookup) x

collatzLengthTabled n = tableRange (1, n) collatzLengthDef

你只需要确保你

let memoized = collatzLengthTabled 10000000
    ... memoized ...

这样只有一个表格会在内存中被创建。

- Cirdec

2

抱歉，我手头没有Haskell编译器，因此对于任何错误的代码，我深表歉意。

如果没有记忆化，那么就有一个函数

collatzLength :: Integer -> Integer
collatzLength n
    | n == 1 = 1
    | even n = 1 + collatzLength (n `div` 2)
    | otherwise = 1 + collatzLength (3*n + 1)

使用记忆化技术，类型签名为

memoCL :: Map Integer Integer -> Integer -> (Map Integer Integer, Integer)

因为memoCL接收表格作为输入并将更新后的表格作为输出。memoCL需要做的是通过let形式拦截递归调用的返回，并插入新结果。

-- table must have an initial entry for 1

memoCL table n = case Map.lookup n table of
    Just m -> (table, m)
    Nothing -> let (table', m) = memoCL table (collatzStep n) in (Map.insert n (1 + m) table', 1 + m)

collatzStep :: Integer -> Integer
collatzStep n = if even n then n `div` 2 else 3*n + 1

某个时刻，您会厌倦上面的成语。那么，就是使用单子（monads）的时候了。

- David Eisenstat

2

我记得在Haskell中找到动态规划算法的备忘录非常不直观，而且已经过了一段时间，但希望以下的技巧能够对您有所帮助。但首先，我不太理解您当前的DP方案，尽管我怀疑它可能效率很低，因为似乎每个答案都需要更新许多条目。(a)我不知道如何在Haskell中实现这一点，(b)您不需要这样做才能高效地解决问题；-)

我建议使用以下方法：首先构建一个普通的递归函数，计算输入数字的正确答案。(提示：它将具有类似于collatzLength :: Int -> Int的签名。)当您有了这个工作的函数时，只需用定义了按需定义元素的数组的定义替换其定义，并将所有递归调用该函数的位置替换为数组查找(例如collatzLength 42将变为collatzLength！42)。这将自动以必要的顺序填充数组！因此，您的“顶级”collatzLength对象现在实际上是一个数组，而不是一个函数。

正如我上面建议的那样，我会使用数组而不是映射数据类型来保存DP表，因为您需要存储从1到1,000,000的所有整数索引的值。

- j_random_hacker

1

最终，我将{-WRONG-}部分修改为通过调用mark x (b, n) [] xs table来完成其应有的功能。

        mark :: Integer -> (Bool, Integer) -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
        mark crtElem (b, n) list xs table
            | b == False    = mark n (findElem n table) (crtElem:list) xs table
            | otherwise = continueWith n list xs table

        continueWith :: Integer -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
        continueWith _   []     xs table    = solve' xs table
        continueWith cnt (y:ys) xs table    = continueWith (cnt - 1) ys xs (Map.insert y (True, cnt - 1) table)

        findElem :: Integer -> Map Integer (Bool, Integer) -> (Bool, Integer)
        findElem n table = 
            case Map.lookup n table of
                Nothing     -> (False, 0)
                Just (b, nr)    -> (b, nr)

但似乎有比这个答案更好（而且更简洁）的答案。

- Thanatos

0

既然我们正在学习递归方案，这里有一个给你。

让我们考虑函子N(A,B,X)=A+B*X，它是一个带有最后一个元素为A的B流。

{-# LANGUAGE DeriveFunctor
           , TypeFamilies
           , TupleSections #-}

import Data.Functor.Foldable
import qualified Data.Map as M
import Data.List
import Data.Function
import Data.Int

data N a b x = Z a | S b x deriving (Functor)

这个流对于多种迭代非常方便。例如，我们可以使用它来表示Collatz序列中的一系列整数：

type instance Base Int64 = N Int Int64

instance Foldable Int64 where
  project 1 = Z 1
  project x | odd x = S x $ 3*x+1
  project x = S x $ x `div` 2

这只是一个代数式，不是初始的，因为转换不是同构（相同的Ints链是2*x和(x-1)/3的一部分），但这足以表示fixpoint Base Int64 Int64。

有了这个定义，cata将把链馈送给给定的代数式，并且您可以使用它来构建整数到链长度的备忘录映射。最后，anamorphism可以使用它生成不同大小问题的解流：

problems = ana (uncurry $ cata . phi) (M.empty, 1) where
    phi :: M.Map Int64 Int -> 
           Base Int64 (Prim [(Int64, Int)] (M.Map Int64 Int, Int64)) ->
           Prim [(Int64, Int)] (M.Map Int64 Int, Int64)
    phi m (Z v) = found m 1 v
    phi m (S x ~(Cons (_, v') (m', _))) = maybe (notFound m' x v') (found m x) $
                                          M.lookup x m

在 (Cons ...) 前面的 ~ 表示惰性模式匹配。我们不会触及到模式，直到需要值为止。如果没有惰性模式匹配，它将始终构造整个链，并使用 map 将是无用的。有了惰性模式匹配，只有在 x 的链长度不在 map 中时，我们才会构造值 v' 和 m'。

辅助函数构造 (Int, 链长度) 对的流：

    found m x v = Cons (x, v) (m, x+1)
    notFound m x v = Cons (x, 1+v) (M.insert x (1+v) m, x+1)

现在只需要解决前999999个问题，找出其中拥有最长链的那一个：

main = print $ maximumBy (compare `on` snd) $ take 999999 problems

这种方法比基于数组的解决方案慢，因为 Map 查找是映射大小的对数，但这种解决方案不是固定大小的。尽管如此，它仍然可以在大约5秒内完成。

- Sassa NF

0

也许你会对我如何解决这个问题感兴趣。虽然它可能不是地球上最有效的东西，但它非常实用 :)

你可以在这里找到代码：https://github.com/fmancinelli/project-euler/blob/master/haskell/project-euler/Problem014.hs

P.S.：免责声明：我正在做Project Euler练习，以学习Haskell，因此解决方案的质量可能存在争议。

- user1231794

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- David Miani · Accepted Answer

最简单的优化方法（如您所识别的）是记忆化。您已尝试创建自己的记忆化系统，但遇到了存储记忆化值的问题。有一些可行的解决方案，例如使用 State monad 或 STArray。然而，您问题的一个更简单的解决方法是 - 使用Haskell现有的记忆化。Haskell默认记住常量值，因此如果您创建一个存储Collatz值的值，它将自动进行记忆化！

以下是一个简单的斐波那契定义示例：

fib :: Int -> Integer
fib n = fibValues !! n where
  fibValues = 1 : 1 : zipWith (+) fibValues (tail fibValues)

fibValues 是一个 [Integer] 数组，由于它只是一个常量值，因此它被记忆化了。但是，这并不意味着它一次性全部记忆化，因为它是一个无限列表，这将永远不会完成。相反，只有在需要时才计算值，因为 Haskell 是惰性的。

因此，如果您使用类似于上面的方法解决问题，您将获得记忆化而不需要太多的工作。但是，在您的解决方案中使用类似上述的列表将不起作用。这是因为 Collatz 算法使用许多不同的值来获取给定数字的结果，因此所使用的容器需要随机访问才能有效。显然的选择是数组。

collatzMemoized :: Array Integer Int

接下来，我们需要用正确的值填充数组。我会写一个函数假装有一个名为collatz的函数，它可以计算任何n的collatz值。同时，请注意数组的大小是固定的，因此需要使用一个值来确定要记忆化的最大数字。我将使用一百万，但可以使用任何值（这是内存/速度权衡）。

collatzMemoized = listArray (1, maxNumberToMemoize) $ map collatz [1..maxNumberToMemoize] where
  maxNumberToMemroize = 1000000

这非常简单，listArray 给出了范围，其中包含所有在该范围内的collatz值列表。请记住，这不会立即计算所有的collatz值，因为这些值是惰性的。

现在可以编写collatz函数。最重要的部分是仅在被检查的数字在其边界内时才检查 collatzMemoized 数组：

collatz :: Integer -> Int
collatz 1 = 1
collatz n
  | inRange (bounds collatzMemoized) nextValue = 1 + collatzMemoized ! nextValue
  | otherwise = 1 + collatz nextValue
  where
    nextValue = case n of
      1 -> 1
      n | even n -> n `div` 2
        | otherwise -> 3 * n + 1

在ghci中，您现在可以看到记忆化的有效性。尝试使用collatz 200000命令。它将花费约2秒钟才能完成。但是，如果您再次运行它，它将立即完成。

最后，解决方案已被找到：

maxCollatzUpTo :: Integer -> (Integer, Int)
maxCollatzUpTo n = maximumBy (compare `on` snd) $ zip [1..n] (map collatz [1..n]) where

然后打印：

main = print $ maxCollatzUpTo 1000000

如果你运行main函数，结果将在大约10秒钟后打印出来。

现在，这种方法的一个小问题是它使用了大量的堆栈空间。在ghci中，它可以正常工作（似乎在堆栈空间方面更加灵活）。然而，如果你编译它并尝试运行可执行文件，它将崩溃（由于堆栈空间溢出）。因此，要运行程序，必须在编译时指定更多内容。可以通过添加“-with-rtsopts='K64m'”来完成。这将增加堆栈到64mb。

现在，该程序可以被编译和运行：

> ghc -O3 --make -with-rtsopts='-K6m' problem.hs

运行 ./problem 将在不到一秒钟内给出结果。