使用Haskell的并行策略时会出现减速问题

Question

使用Haskell的并行策略时会出现减速问题

14

我正在完成 Andre Loh 的确定性并行编程练习，并尝试使用策略将 N-Queens 串行代码转换为并行代码。但我注意到并行代码运行速度比串行代码慢得多，并且在执行时出现了栈空间不足的错误。

以下是并行 N-Queens 的代码：

import Control.Monad
import System.Environment
import GHC.Conc
import Control.Parallel.Strategies
import Data.List
import Data.Function

type PartialSolution = [Int] -- per column, list the row the queen is in
type Solution = PartialSolution

type BoardSize = Int

chunk :: Int -> [a] -> [[a]]
chunk n [] = []
chunk n xs = case splitAt n xs of
         (ys, zs) -> ys : chunk n zs

-- Generate all solutions for a given board size.
queens :: BoardSize -> [Solution]
--queens n = iterate (concatMap (addQueen n)) [[]] !! n
queens n = iterate (\l -> concat (map (addQueen n) l `using` parListChunk (n `div`            numCapabilities) rdeepseq)) [[]] !! n


-- Given the size of the problem and a partial solution for the
-- first few columns, find all possible assignments for the next
-- column and extend the partial solution.
addQueen :: BoardSize -> PartialSolution -> [PartialSolution]
addQueen n s = [ x : s | x <- [1..n], safe x s 1 ]

-- Given a row number, a partial solution and an offset, check
-- that a queen placed at that row threatens no queen in the
-- partial solution.
safe :: Int -> PartialSolution -> Int -> Bool
safe x []    n = True
safe x (c:y) n = x /= c && x /= c + n && x /= c - n && safe x y (n + 1)

main = do
        [n] <- getArgs
        print $ length $ queens (read n)

这一行代码 (\l -> concat (map (addQueen n) l using parListChunk (n div numCapabilities) rdeepseq)) 是我从原始代码中更改的。我已经看过Simon Marlow的解决方案，但我想知道我的代码为什么会变慢并出错。

提前感谢。

- prasannak

3

你是否在编译时使用了 -O2 选项，而在运行时使用了 -threaded -Nn 选项（其中 n 是你的CPU核心数量）？ - Don Stewart

请注意，-threaded 是编译时选项，而不是运行时选项。此外，唐，你什么时候回到贝利酒吧？啤酒龙头想念你。 - Thomas M. DuBuisson

不要忘记加上 -rtsopts 和 +RTS。 - Louis Wasserman

1

即使使用了“-threaded”等选项，他的考虑仍然成立。我对并行策略一无所知，但在我的机器上，线程版本运行速度大约慢了3倍。 - Riccardo T.

为什么只是添加了“-threaded”选项，代码运行速度就会变慢？我有一段正在做同样事情的代码。这是由于不正确的并行设计/使用还是其他原因引起的？ - crockeea

显示剩余2条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Thomas M. DuBuisson · Accepted Answer

你的工作单元划分过于细碎，div n numCapabilities 的 parListChunk 参数在你的系统上可能为7（2个内核，n值约为14）。这个列表很快就会变得非常大，因此没有必要激发如此小的工作单元（我不明白为什么将其与n值相关联是有意义的）。

如果我增加十倍（使激发单元在这种情况下为70），那么与单线程相比，我会获得明显的性能提升。此外，我没有你所提到的堆栈问题——如果更改parListChunk值可以解决该问题，那么我会将其报告为错误。

如果我将分块间隔设为800，则时间最多达到5.375秒，而不是7.9秒。超过800，性能开始再次变差，结果会因人而异。

编辑：

[tommd@mavlo Test]$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.0.4
[tommd@mavlo Test]$ ghc -O2 so.hs -rtsopts -threaded -fforce-recomp ; time ./so 13 +RTS -N2
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
73712
real    0m5.404s

[tommd@mavlo Test]$ ghc -O2 so.hs -rtsopts -fforce-recomp ; time ./so 13
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
73712
real    0m8.134s