为什么这个Haskell程序执行效率如此之低?

3

我是一个Haskell新手,对于这个程序的性能表现感到困惑。我尝试在不同的地方强制使用严格变量,但似乎没有什么区别。

下面是我的代码(这个程序的目的是从标准输入中产生输入字节的频率):

{-# LANGUAGE BangPatterns #-}

import Control.Concurrent (forkIO, killThread)
import Control.Concurrent.MVar
import qualified Data.IntMap as IntMap
import Data.IntMap.Strict (IntMap)
import Control.Monad.Fix
import Control.Monad (when)
import qualified Data.Char as Char
import qualified System.IO as IO
import System.IO (hSetBinaryMode, hFlush)
import Data.List as List
import Text.PrettyPrint.Boxes as Boxes
import Text.Printf (printf)
import Data.Function

data BFreq = BFreq Integer (IntMap Integer)

main :: IO ()
main = do
  putStrLn "analyze data from stdin"
  hSetBinaryMode IO.stdin True
  mv <- newEmptyMVar
  tid <- forkIO $ statusUpdater mv
  bf <- run mv
  killThread tid
  displayResults bf

resultTable :: [[String]] -> Box
resultTable rows =
  Boxes.hsep 4 Boxes.left boxed_cols
  where
    cols       = transpose rows
    boxed_cols = map (Boxes.vcat Boxes.left . map text) cols

displayResults :: BFreq -> IO ()
displayResults (BFreq n counts) = do
  putStrLn $ "read " ++ (show n) ++ " bytes"
  when (n > 0) (displayFreqs n counts)

displayFreqs :: Integer -> IntMap Integer -> IO ()
displayFreqs n counts =
  do
    putStrLn "frequencies:"
    Boxes.printBox $ resultTable rows
  where
    cmp x y       = compare (snd y) (snd x)
    sorted_counts = List.sortBy cmp $ IntMap.assocs counts

    intdiv :: Integer -> Integer -> Float
    intdiv a b = (fromIntegral a) / (fromIntegral b)

    percent y    = printf "%.2f" (100*intdiv y n)
    show_byte x  = (show $ Char.chr x) ++ " (" ++ (show x) ++ "):"
    show_count y = (percent y) ++ "% (" ++ (show y) ++ ")"

    rows = map (\(x,y) -> [show_byte x, show_count y]) sorted_counts


run :: MVar Integer -> IO BFreq
run mv = 
  fn mv 0 IntMap.empty 
  where
    fn mv !n !mp =
      do
        tryPutMVar mv n
        eof <- IO.isEOF
        if eof
          then return $ BFreq n mp
          else do
            b <- getChar
            fn mv (1+n) (new_map b)
      where
        k x       = Char.ord x
        old_val x = IntMap.findWithDefault 0 (k x) mp
        new_map x = IntMap.insert (k x) ((old_val x)+1) mp

statusUpdater :: MVar Integer -> IO ()
statusUpdater mv = 
  do
    takeMVar mv >>= print_progress
    statusUpdater mv
  where
    print_progress n = 
      do
        putStr $ "\rbytes: "
        when (gbs > 0) $ putStr $ (show gbs) ++ " GBs "
        when (mbs > 0) $ putStr $ (show mbs) ++ " MBs "
        when (kbs > 0) $ putStr $ (show kbs) ++ " KBs "
        when (gbs < 1 && mbs < 1 && kbs < 1) $ putStr $ (show bs) ++ " Bs "
        hFlush IO.stdout
      where
        (gbs, gbr)   = quotRem n 0x40000000
        (mbs, mbr)   = quotRem gbr 0x100000
        (kbs, bs)    = quotRem mbr 0x400

以下是我运行它时的情况(注意:我使用-O2编译):

$> cabal build -v                                                                                             
creating dist/build                                                                                                                       
creating dist/build/autogen                                                                                                                 
Building bfreq-0.1.0.0...                                                                                                                   
Preprocessing executable 'bfreq' for bfreq-0.1.0.0...                                                                                       
Building executable bfreq...                                                                                                                  
creating dist/build/bfreq                                                                                                                     
creating dist/build/bfreq/bfreq-tmp                                                                                                           
/usr/bin/ghc --make -o dist/build/bfreq/bfreq -hide-all-packages -fbuilding-cabal-package -package-conf dist/package.conf.inplace -i -idist/build/bfreq/bfreq-tmp -i. -idist/build/autogen -Idist/build/autogen -Idist/build/bfreq/bfreq-tmp -optP-include -optPdist/build/autogen/cabal_macros.h -odir dist/build/bfreq/bfreq-tmp -hidir dist/build/bfreq/bfreq-tmp -stubdir dist/build/bfreq/bfreq-tmp -package-id base-4.5.0.0-40b99d05fae6a4eea95ea69e6e0c9702 -package-id boxes-0.1.3-e03668bca38fe3e879f9d695618ddef3 -package-id containers-0.5.3.1-80819105034e34d03d22b1c20d6fd868 -O -O2 -rtsopts -XHaskell98 ./bfreq.hs
[1 of 1] Compiling Main             ( bfreq.hs, dist/build/bfreq/bfreq-tmp/Main.o )
Linking dist/build/bfreq/bfreq ...
$> cat /dev/urandom | head -c 9999999 > test_data
$> cat ./test_data | ./dist/build/bfreq/bfreq +RTS -sstderr
analyze data from stdin
bytes: 9 MBs 521 KBs read 9999999 bytes
frequencies:
'\137' (137):    0.40% (39642)
'H' (72):        0.40% (39608)
<...>
'L' (76):        0.39% (38617)
'\246' (246):    0.39% (38609)
'I' (73):        0.38% (38462)
'q' (113):       0.38% (38437)
   9,857,106,520 bytes allocated in the heap
  14,492,245,840 bytes copied during GC
   3,406,696,360 bytes maximum residency (13 sample(s))
      14,691,672 bytes maximum slop
            6629 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     18348 colls,     0 par   10.90s   10.90s     0.0006s    0.0180s
  Gen  1        13 colls,     0 par   15.20s   19.65s     1.5119s    12.6403s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   14.45s  ( 14.79s elapsed)
  GC      time   26.10s  ( 30.56s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   40.55s  ( 45.35s elapsed)

  %GC     time      64.4%  (67.4% elapsed)

  Alloc rate    682,148,818 bytes per MUT second

  Productivity  35.6% of total user, 31.9% of total elapsed

除非我误解了上面的调试输出,否则我的程序使用了6 GB的内存?测试数据不到10 MB,这是怎么回事?

对于如何处理这种Haskell问题的一般建议也很好。换句话说,我应该避免在这种I/O中心的情况下使用Haskell吗?我应该使用pipes库来处理这种事情吗?

编辑: 感谢您的帮助,正确导入IntMap的严格版本可以解决内存问题。

我无法使性能分析(-fprof-auto)起作用,因为似乎我的所有包都没有为性能分析编译。我通过安装适用于我的操作系统(Ubuntu:ghc-prof)的GHC性能分析包来解决缺少基本库的性能分析问题,但根据此处所述,我需要手动重新安装所有Haskell库以进行性能分析。我现在没有时间做这件事,所以我只是把这个链接放在这里,以便任何有类似问题的人受益。


1
你尝试过进行分析以找出哪些函数占用了大量内存吗?很可能只有一个函数在占用内存,一旦你知道是哪个函数,就会更容易找到内存泄漏的问题。 - bheklilr
1
一个通用的指导原则:如果标准库函数能够满足你的需求,那么就使用它——它可能被更好地实现了。在你的 statusUpdater 中,你有一个无限循环,这可以使用 Control.Monad.forever 实现。 - YellPika
2
另外,在您的运行函数中,首先通过查找值(findWithDefault),然后修改它(insert)来处理IntMapIntMap 提供了一个 alter 函数,可以一次完成您想要做的操作。 - YellPika
1个回答

13

如果按照GHC指南中的剖析章节,使用-fprof-auto进行编译,你会看到run.fn.new_maprun.fn发生了大量的分配。

相关代码:

new_map x = IntMap.insert (k x) ((old_val x)+1) mp

怀疑:((old_val x)+1) 正在创建一系列未求值的thunks(thunk是一个可以被延迟计算的表达式)。建议修改:

new_map x = let ov  = old_val x + 1 in
            ov `seq` IntMap.insert (k x) ov mp

看哪!分配,GC和内存使用都大幅降低了。

编辑:您可能打算import qualified Data.IntMap.Strict as IntMap,这样更改就不必要了。


真是太厉害了,你赢了 ;)。结果发现将导入改为“Strict”版本就足够了。内存使用量降至2mb。 - YellPika

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接