Haskell中哈希表插入的空间泄漏问题

6
我一直在编写一个直方图,这里的大家提供了很好的帮助。我使用哈希表来存储键和频率值,因为键的分布是未知的;所以它们可能没有被排序或连续地放在一起。
我的代码问题在于它花费了太多时间在GC上,看起来像是空间泄漏,因为GC的时间占了60.3% - 所以我的生产力只有39.7%。
出了什么问题?我尝试在直方图函数中使事情变得更加严格,并将其内联(GC时间从69.1%降至59.4%)。
请注意,我通过不更新HT中的频率来简化了此代码。
{-# LANGUAGE BangPatterns #-}
import qualified Data.HashTable.IO as H
import qualified Data.Vector as V

type HashTable k v = H.BasicHashTable k v

n :: Int 
n = 5000000

kv :: V.Vector (Int,Int)
kv = V.zip k v 
 where
    k = V.generate n (\i -> i `mod` 10)
    v = V.generate n (\i -> 1)

histogram :: V.Vector (Int,Int) -> Int -> IO (H.CuckooHashTable Int Int)
histogram vec !n = do
    ht <- H.newSized n 
    go ht (n-1)
        where
            go ht = go'
                where 
                    go' (-1) = return ht
                    go' !i = do
                        let (k,v) = vec V.! i
                        H.insert ht k v
                        go' (i-1)
{-# INLINE histogram #-}

main :: IO ()
main = do
    ht <- histogram kv n
    putStrLn "done"

以下是编译的步骤:

ghc --make -O3 -fllvm -rtsopts histogram.hs

诊断:

jap@devbox:~/dev$ ./histogram +RTS -sstderr
done
     863,187,472 bytes allocated in the heap
     708,960,048 bytes copied during GC
     410,476,592 bytes maximum residency (5 sample(s))
       4,791,736 bytes maximum slop
             613 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1284 colls,     0 par    0.46s    0.46s     0.0004s    0.0322s
  Gen  1         5 colls,     0 par    0.36s    0.36s     0.0730s    0.2053s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.51s  (  0.50s elapsed)
  GC      time    0.82s  (  0.82s elapsed)
  EXIT    time    0.03s  (  0.04s elapsed)
  Total   time    1.36s  (  1.36s elapsed)

  %GC     time      60.3%  (60.4% elapsed)

  Alloc rate    1,708,131,822 bytes per MUT second

  Productivity  39.7% of total user, 39.7% of total elapsed

快速查看源代码显示,H.insert 有时会增加哈希表的大小,这将调用 newSizedReal,它创建了两个新的 MutableArray,可能会让旧的 MutableArray 被垃圾回收。堆分析显示,newSizedReal 确实分配了大量内存。我不知道解决方案是什么,但我怀疑不使用哈希表可能是一个解决办法。 - user2407038
我的代码改编自哈希表的fromList实现,当我运行这个代码时,https://gist.github.com/algolicious/11067565,它在GC中仅花费了1.3%的时间,并且运行非常快。 - jap
创建了哈希表后,您如何使用它?也许在这里使用更好的数据结构。 - cdk
@jap дҪ зҡ„fromListзӨәдҫӢз»ҷдәҶжҲ‘дёҖдёӘжғіжі•гҖӮжҲ‘е°қиҜ•дҪҝз”Ёkv = V.fromList (zip (take n $ cycle [1..10]) (repeat 1))жқҘз”ҹжҲҗеҗ‘йҮҸпјҢиҖҢдёҚжҳҜдҪҝз”ЁgenerateгҖӮиҝҷеңЁжҲ‘зҡ„жңәеҷЁдёҠе°ҶGCж—¶й—ҙеҮҸе°‘дәҶ30%гҖӮжүҖд»ҘжҲ‘зҢңй—®йўҳеҮәеңЁеҗ‘йҮҸдёҠпјҢиҖҢдёҚжҳҜе“ҲеёҢиЎЁдёҠгҖӮ - user2407038
我尝试使用数组构建直方图:histogram = accumArray (+) 0 (1, 10) (zip (take n $ cycle [1..10]) (take n $ repeat 1))。比哈希表快得多,垃圾回收时间也更短。它需要预先知道数据范围,但这应该是一个简单的计算。 - user2407038
1个回答

10

为了比较起见,这是我按照您发布的代码运行后得到的结果:

     863,187,472 bytes allocated in the heap
     708,960,048 bytes copied during GC
     410,476,592 bytes maximum residency (5 sample(s))
       4,791,736 bytes maximum slop
             613 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1284 colls,     0 par    1.01s    1.01s     0.0008s    0.0766s
  Gen  1         5 colls,     0 par    0.81s    0.81s     0.1626s    0.4783s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.04s  (  1.04s elapsed)
  GC      time    1.82s  (  1.82s elapsed)
  EXIT    time    0.04s  (  0.04s elapsed)
  Total   time    2.91s  (  2.91s elapsed)

  %GC     time      62.6%  (62.6% elapsed)

  Alloc rate    827,493,210 bytes per MUT second

  Productivity  37.4% of total user, 37.4% of total elapsed

考虑到您的向量元素只是(Int, Int)元组,我们没有理由不使用Data.Vector.Unboxed而不是普通的Data.Vector。这已经导致了显着的改进:

     743,148,592 bytes allocated in the heap
          38,440 bytes copied during GC
     231,096,768 bytes maximum residency (4 sample(s))
       4,759,104 bytes maximum slop
             226 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       977 colls,     0 par    0.23s    0.23s     0.0002s    0.0479s
  Gen  1         4 colls,     0 par    0.22s    0.22s     0.0543s    0.1080s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.04s  (  1.04s elapsed)
  GC      time    0.45s  (  0.45s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    1.49s  (  1.49s elapsed)

  %GC     time      30.2%  (30.2% elapsed)

  Alloc rate    715,050,070 bytes per MUT second

  Productivity  69.8% of total user, 69.9% of total elapsed

接下来,我们可以使用vector库提供的优化函数来代替手动实现对向量的递归处理。 代码...

import qualified Data.HashTable.IO as H
import qualified Data.Vector.Unboxed as V

n :: Int
n = 5000000

kv :: V.Vector (Int,Int)
kv = V.zip k v
 where
    k = V.generate n (\i -> i `mod` 10)
    v = V.generate n (\i -> 1)

histogram :: V.Vector (Int,Int) -> Int -> IO (H.CuckooHashTable Int Int)
histogram vec n = do
    ht <- H.newSized n
    V.mapM_ (\(k, v) -> H.insert ht k v) vec
    return ht
{-# INLINE histogram #-}

main :: IO ()
main = do
    ht <- histogram kv n
    putStrLn "done"

...结果如下:

     583,151,048 bytes allocated in the heap
          35,632 bytes copied during GC
     151,096,672 bytes maximum residency (3 sample(s))
       3,003,040 bytes maximum slop
             148 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       826 colls,     0 par    0.20s    0.20s     0.0002s    0.0423s
  Gen  1         3 colls,     0 par    0.12s    0.12s     0.0411s    0.1222s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.92s  (  0.92s elapsed)
  GC      time    0.32s  (  0.33s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    1.25s  (  1.25s elapsed)

  %GC     time      25.9%  (26.0% elapsed)

  Alloc rate    631,677,209 bytes per MUT second

  Productivity  74.1% of total user, 74.0% of total elapsed

节省了81MB,还不错。我们能做得更好吗?

堆剖析(当你遇到内存消耗问题时,应该首先考虑的东西 - 没有堆剖析是盲目调试)将显示,即使使用原始代码,峰值内存消耗也会在非常早的时候发生。严格来说,我们没有泄漏;我们只是从一开始就花费了很多内存。现在,请注意哈希表是用ht <- H.newSized n创建的,其中n = 5000000。除非您期望有如此多的不同键(而不是元素),否则这是极其浪费的。将初始大小更改为10(您测试中实际拥有的键数)可以显著改善情况:

     432,059,960 bytes allocated in the heap
          50,200 bytes copied during GC
          44,416 bytes maximum residency (2 sample(s))
          25,216 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       825 colls,     0 par    0.01s    0.01s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0002s    0.0003s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.90s  (  0.90s elapsed)
  GC      time    0.01s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.91s  (  0.90s elapsed)

  %GC     time       0.6%  (0.6% elapsed)

  Alloc rate    481,061,802 bytes per MUT second

  Productivity  99.4% of total user, 99.4% of total elapsed

最后,我们不妨简化生活并尝试使用纯粹而高效的哈希映射,来自unordered-containers。 代码...

import qualified Data.HashMap.Strict as M
import qualified Data.Vector.Unboxed as V

n :: Int
n = 5000000

kv :: V.Vector (Int,Int)
kv = V.zip k v
 where
    k = V.generate n (\i -> i `mod` 10)
    v = V.generate n (\i -> 1)

histogram :: V.Vector (Int,Int) -> M.HashMap Int Int
histogram vec =
    V.foldl' (\ht (k, v) -> M.insert k v ht) M.empty vec

main :: IO ()
main = do
    print $ M.size $ histogram kv
    putStrLn "done"

...和结果。

          55,760 bytes allocated in the heap
           3,512 bytes copied during GC
          44,416 bytes maximum residency (1 sample(s))
          17,024 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0002s    0.0002s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.34s  (  0.34s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.34s  (  0.34s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    162,667 bytes per MUT second

  Productivity  99.9% of total user, 100.0% of total elapsed

速度提升了约60%。目前还不清楚在处理更大数量的键时会如何扩展,但是根据您的测试数据,unordered-containers不仅更加方便(纯函数;实际更新直方图值只需要将M.insert更改为M.insertWith),而且速度更快。


哇,我无话可说!我的直方图创建代码已经从1360毫秒降到了125毫秒,快了10倍以上!而且代码还被精简为同样的一行。非常感谢你花时间调查,我也学到了很多。Haskell 令我印象深刻。 - jap
1
如果您可以使用fromList来创建HashMap,那么它应该会更快。 - tibbe
谢谢tibbe,明天我会试一下。为什么它会更快呢?我很好奇,因为我认为列表不是连续的。 - jap

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接