高效的并行策略

15

我试图理解并行策略。我认为我理解了每个组合器的作用,但每次尝试在多于1个核心的情况下使用它们时,程序都会明显减慢。

例如,之前我尝试从约700个文档中计算直方图(以及唯一单词)。我认为使用文件级细粒度就可以了。使用 -N4 时,我得到了1.70的工作平衡。然而,使用 -N1 比使用 -N4 快一半的时间。我不确定问题是什么,但我想知道如何决定何时/何处/如何并行化并获得一些了解。如何进行并行化以使速度随着内核数目的增加而增加,而不是减少?

import Data.Map (Map)
import qualified Data.Map as M
import System.Directory
import Control.Applicative
import Data.Vector (Vector)
import qualified Data.Vector as V
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import Data.Text (Text)
import System.FilePath ((</>))
import Control.Parallel.Strategies
import qualified Data.Set as S
import Data.Set (Set)
import GHC.Conc (pseq, numCapabilities)
import Data.List (foldl')

mapReduce stratm m stratr r xs = let
  mapped = parMap stratm m xs
  reduced = r mapped `using` stratr
  in mapped `pseq` reduced

type Histogram = Map Text Int

rootDir = "/home/masse/Documents/text_conversion/"

finnishStop = ["minä", "sinä", "hän", "kuitenkin", "jälkeen", "mukaanlukien", "koska", "mutta", "jos", "kuitenkin", "kun", "kunnes", "sanoo", "sanoi", "sanoa", "miksi", "vielä", "sinun"]
englishStop = ["a","able","about","across","after","all","almost","also","am","among","an","and","any","are","as","at","be","because","been","but","by","can","cannot","could","dear","did","do","does","either","else","ever","every","for","from","get","got","had","has","have","he","her","hers","him","his","how","however","i","if","in","into","is","it","its","just","least","let","like","likely","may","me","might","most","must","my","neither","no","nor","not","of","off","often","on","only","or","other","our","own","rather","said","say","says","she","should","since","so","some","than","that","the","their","them","then","there","these","they","this","tis","to","too","twas","us","wants","was","we","were","what","when","where","which","while","who","whom","why","will","with","would","yet","you","your"]
isStopWord :: Text -> Bool
isStopWord x = x `elem` (finnishStop ++ englishStop)

textFiles :: IO [FilePath]
textFiles = map (rootDir </>) . filter (not . meta) <$> getDirectoryContents rootDir
  where meta "." = True
        meta ".." = True
        meta _ = False

histogram :: Text -> Histogram
histogram = foldr (\k -> M.insertWith' (+) k 1) M.empty . filter (not . isStopWord) . T.words

wordList = do
  files <- mapM TI.readFile =<< textFiles
  return $ mapReduce rseq histogram rseq reduce files
  where
    reduce = M.unions

main = do
  list <- wordList
  print $ M.size list

对于文本文件,我正在使用转换为文本文件的pdf,因此无法提供它们,但是为了这个目的,几乎可以使用来自项目古腾堡的任何书籍。

编辑: 向脚本中添加了导入。


1
直方图应该使用foldl'而不是foldr。在构建Map之前,foldr会构建与列表长度一样深的thunk。 - Daniel Fischer
3
如果您提供一个小而完整的例子,回答这样一个问题将会更容易得多。不深入查看:您确定mapReduce的第一个参数rseq足以强制每个工作块真正并行执行吗?在parMap中每个列表元素需要执行的工作量是否足够大,以确保并行任务的良好粒度?您尝试过在程序上运行threadscope以查看每个核心正在进行的操作吗?您尝试使用+RTS -s运行程序,以查看垃圾收集花费了多少时间吗? - kosmikus
kosmikus,你说的“完整示例”是什么样子的?除了导入之外,该脚本完全可以运行。对于rseq / rdeepseq,我尝试了其他组合但没有成功。至于parMap,我还尝试了map with parListChunk和parListN。至于threadscope,在动作和gc方面似乎都很稳定。“-s”表示它的工作时间为60%,这比“-N1”情况要好。 - Masse
弄清楚导入仍然需要工作!如果它不能编译,那就还不完整。此外,如果为了成功运行,需要其他文本文件,提供典型输入的链接将是有帮助的。 - kosmikus
Kosmikus,你是对的。我添加了导入和关于文本文件的注释。任何大约500kb到2Mb的散文都可以。 - Masse
@AndrewC 它按顺序读取文件,作为严格的“Text”值,因此线程数量不应影响它。 - sabauma
2个回答

4
实际上,让并行组合器良好地扩展可能会很困难。其他人提到了加强代码要求以确保你真正地并行地执行工作,这绝对是重要的。 两件事情可以真正影响性能,即大量的内存遍历和垃圾收集。即使您不会产生大量垃圾,但大量的内存遍历也会增加CPU缓存的压力,最终您的内存总线成为瓶颈。您的isStopWord函数执行了大量的字符串比较,并且必须遍历一个相当长的链表才能完成。使用内置Set类型或者更好的方法——unordered-containers package中的HashSet类型(因为重复的字符串比较可能是昂贵的,尤其是如果它们共享常见前缀),可以节省很多工作。
import           Data.HashSet                (HashSet)
import qualified Data.HashSet                as S

...

finnishStop :: [Text]
finnishStop = ["minä", "sinä", "hän", "kuitenkin", "jälkeen", "mukaanlukien", "koska", "mutta", "jos", "kuitenkin", "kun", "kunnes", "sanoo", "sanoi", "sanoa", "miksi", "vielä", "sinun"]
englishStop :: [Text]
englishStop = ["a","able","about","across","after","all","almost","also","am","among","an","and","any","are","as","at","be","because","been","but","by","can","cannot","could","dear","did","do","does","either","else","ever","every","for","from","get","got","had","has","have","he","her","hers","him","his","how","however","i","if","in","into","is","it","its","just","least","let","like","likely","may","me","might","most","must","my","neither","no","nor","not","of","off","often","on","only","or","other","our","own","rather","said","say","says","she","should","since","so","some","than","that","the","their","them","then","there","these","they","this","tis","to","too","twas","us","wants","was","we","were","what","when","where","which","while","who","whom","why","will","with","would","yet","you","your"]

stopWord :: HashSet Text
stopWord = S.fromList (finnishStop ++ englishStop)

isStopWord :: Text -> Bool
isStopWord x = x `S.member` stopWord

将您的isStopWord函数替换为此版本可以实现更好的性能和更好的扩展性(尽管肯定不是1-1)。您还可以考虑使用HashMap(来自同一软件包)而不是Map,原因相同,但我这样做没有明显的变化。

另一个选择是增加默认堆大小以减轻GC的压力,并为其提供更多的空间移动。给编译代码一个默认堆大小为1GB(-H1G标志),在4个核心上获得大约50%的GC平衡,而没有则只有~25%(它也运行得更快,约30%)。

通过这两个改变,四个内核的平均运行时间(在我的机器上)从约10.5秒降至约3.5秒。可以根据GC统计数据进行改进(仍然只花费58%的时间进行有效工作),但要显着改善可能需要对算法进行更 drast 的更改。


3
我愿意接受重大变革。毕竟,这是为了让我学习 :) - Masse

4

我认为Daniel说得对——Data.Map和列表都是惰性数据结构;你应该同时使用foldl'和insertWith'来确保每个块的工作都被急切地完成,否则所有工作都将延迟到顺序部分(reduce)。

此外,以每个文件创建一个spark是否是正确的粒度并不明显,特别是如果文件大小差异很大。如果可能的话,最好将单词列表连接起来并将其分成均匀大小的块(看看parListChunk combinator)。

趁机也要注意一下使用延迟IO(readFile)打开多个文件时遇到的一些陷阱(因为运行时系统会长时间持有文件句柄,可能会用尽文件句柄)。


从我的评论中可以看出,我已经尝试过parMap、parListN和parListChunk。它们的性能特征都相似。将foldr更改为foldl'使平衡度上升到>2,但总运行时间几乎翻了一倍。 - Masse
我修改了代码,将foldr -> foldl',并将mapreduce从wordList移动到histogram。我将文件分成行,并在mapreduce中使用parListChunk stratm(100)xs。我成功地将运行时间增加了四倍(从约70秒增加到300秒)。 - Masse

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接