Data.Map是二叉搜索树最好的数据类型吗？

Question

Data.Map是二叉搜索树最好的数据类型吗？

3

我正在编写一些对性能要求较高的代码，需要在二叉搜索树中进行查找。树本身会在每次运行开始时生成，然后永远不会改变。查找将经常进行，数量可能会很大，甚至达到数百万次，因此我只关注查找的性能。

树的键是Char，可以便宜地转换为Int，因此我认为Data.Map和其衍生物应该是最佳选择：我使用了严格的IntMap。但是，我的性能显著低于我从二叉搜索树预期的性能。我在这里用一个相当牵强附会的例子进行演示。

import qualified Data.IntMap.Strict as IM


divideByTen :: Int -> Int
divideByTen n = maybe 0 snd $ IM.lookupLE n divideByTenMap

divideByTenMap :: IM.IntMap Int
divideByTenMap = IM.fromList $ zip [0,10..1000] [0..]

我的数据不会被均匀分布，因此某些子集的性能比整体更重要。因此，引入一些常用输入的快捷方式是有意义的。

divideByTenFaster :: Int -> Int
divideByTenFaster n
  | n < 10  = 0
  | n < 20  = 1
  | n < 30  = 2
  | n < 40  = 3
  | n < 50  = 4
  | n < 60  = 5
  | n < 70  = 6
  | n < 80  = 7
  | n < 90  = 8
  | n < 100 = 9
  | otherwise = divideByTen n

这个 IntMap 有100个元素，所以二分查找最多只需要7次比较。因此我期望对于 n < 70，divideByTenFaster 要比 divideByTen 更快，之后反而要更慢。但是实际情况是，即使对于 n < 100，divideByTenFaster 的速度仍然是 IntMap 查找的两倍。运行以下基准测试代码：

import Criterion.Main

main :: IO ()
main = do
    defaultMainWith defaultConfig $
      [ bench "divideByTen"       $ nf divideByTen 99
      , bench "divideByTenFaster" $ nf divideByTenFaster 99
      ]

我们获取

benchmarking divideByTen
time                 18.50 ns   (18.47 ns .. 18.54 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 18.49 ns   (18.47 ns .. 18.51 ns)
std dev              79.42 ps   (61.51 ps .. 110.3 ps)

benchmarking divideByTenFaster
time                 7.164 ns   (7.154 ns .. 7.176 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 7.161 ns   (7.145 ns .. 7.176 ns)
std dev              49.60 ps   (36.84 ps .. 68.32 ps)

在我的真实问题中，我看到了更显著的差异：IntMap查找速度慢了一个数量级。我发现自己编写了长且难以维护的快捷方式，基本上是手动构建了一个不平衡二叉搜索树，在常见情况下显着提高了性能，并稍微降低了一般情况下的性能。

我可以切换到基于数组的解决方案，但我认为这是使用Map的理想用例。是否有一些我错过的优化方法，或者其他数据类型更好一些？

完整背景是，我正在查找Unicode字符宽度，该宽度由Unicode规范确定。查找树大约有700到1100个元素，具体取决于上下文。 ASCII子集的性能非常重要，而广泛使用的脚本（如拉丁文、汉字、阿拉伯文、天城文等）的性能应该非常快，而较少使用的脚本的性能可以牺牲一些。

编辑：

鉴于评论中的许多建议，我对几种不同的方法进行了基准测试来解决这个问题。对于纯查找性能，没有什么比一个巨大的数组更好的了，虽然这需要非微不足道的内存和构建成本，但其他方法的性能都差不多，除了直接实现Data.Map.Strict的结构（感谢@dfeuer在下面的评论中提供的建议）。

{-# LANGUAGE BangPatterns     #-}
{-# LANGUAGE FlexibleContexts #-}


import qualified Data.IntMap.Strict as IM
import qualified Data.Map.Strict as M
import qualified Data.Map.Internal as MInt
import qualified Data.Vector as V
import qualified Data.Vector.Generic as G
import qualified Data.Vector.Unboxed as VU

bstSize :: Int
bstSize = 800

domainSize :: Int
domainSize = 1100000

divideByTenShortcut :: Int -> Int
divideByTenShortcut n
  | n < 10 = 0
  | n < 20 = 1
  | n < 30 = 2
  | n < 40 = 3
  | n < 50 = 4
  | n < 60 = 5
  | n < 70 = 6
  | n < 80 = 7
  | n < 90 = 8
  | n < 100 = 9
  | otherwise = divideByTenIntMapLookup n

divideByTenIntMapLookup :: Int -> Int
divideByTenIntMapLookup n = maybe 0 snd $ IM.lookupLE n divideByTenIntMap

divideByTenIntMap :: IM.IntMap Int
divideByTenIntMap = IM.fromList $ zip [0,10..bstSize*10] [0..]

divideByTenMapLookup :: Int -> Int
divideByTenMapLookup n = maybe 0 snd $ M.lookupLE n divideByTenMap

divideByTenMap :: M.Map Int Int
divideByTenMap = M.fromList $ zip [0,10..bstSize*10] [0..]

arrayBinarySearch :: G.Vector v Int => v Int -> Int -> Int
arrayBinarySearch vs search = loop 0 (G.length vs)
  where
    loop !low !high
       | high <= low = low
       | otherwise = case compare midval search of
            LT -> loop midpoint high
            GT -> loop low (midpoint - 1)
            EQ -> midpoint
      where
        midpoint = (low + high + 1) `quot` 2
        midval = G.unsafeIndex vs midpoint

divideByTenArrayLookup :: Int -> Int
divideByTenArrayLookup n = let i = arrayBinarySearch divideByTenArray n in V.unsafeIndex divideByTenArraySols i

divideByTenArray :: V.Vector Int
divideByTenArray = V.fromList [0,10..bstSize*10]

divideByTenArraySols :: V.Vector Int
divideByTenArraySols = V.fromList [0..bstSize]

divideByTenArrayUnboxedLookup :: Int -> Int
divideByTenArrayUnboxedLookup n = let i = arrayBinarySearch divideByTenArrayUnboxed n in VU.unsafeIndex divideByTenArrayUnboxedSols i

divideByTenArrayUnboxed :: VU.Vector Int
divideByTenArrayUnboxed = VU.fromList [0,10..bstSize*10]

divideByTenArrayUnboxedSols :: VU.Vector Int
divideByTenArrayUnboxedSols = VU.fromList [0..bstSize]

divideByTenFullArrayLookup :: Int -> Int
divideByTenFullArrayLookup = V.unsafeIndex divideByTenFullArray

divideByTenFullArray :: V.Vector Int
divideByTenFullArray = V.fromList $ map (`quot` 10) [0..domainSize]

divideByTenFullArrayUnboxedLookup :: Int -> Int
divideByTenFullArrayUnboxedLookup = VU.unsafeIndex divideByTenFullArrayUnboxed

divideByTenFullArrayUnboxed :: VU.Vector Int
divideByTenFullArrayUnboxed = VU.fromList $ map (`quot` 10) [0..domainSize]

divideByTenCharMapLookup :: Int -> Int
divideByTenCharMapLookup n = lookupLE n divideByTenCharMap

divideByTenCharMap :: CharMap
divideByTenCharMap = fromList $ zip [0,10..bstSize*10] [0..]

data CharMap = Bin {-# UNPACK #-} !Size !Int !Int !CharMap !CharMap
             | Tip
type Size     = Int

lookupLE :: Int -> CharMap -> Int
lookupLE = goNothing
  where
    goNothing !_ Tip = 0
    goNothing k (Bin _ kx x l r) = case compare k kx of LT -> goNothing k l
                                                        EQ -> x
                                                        GT -> goJust k kx x r

    goJust !_ !_ x' Tip = x'
    goJust k kx' x' (Bin _ kx x l r) = case compare k kx of LT -> goJust k kx' x' l
                                                            EQ -> x
                                                            GT -> goJust k kx x r
{-# INLINABLE lookupLE #-}

fromList :: [(Int, Int)] -> CharMap
fromList = repack . M.fromList
  where
    repack MInt.Tip = Tip
    repack (MInt.Bin s k v l r) = Bin s k v (repack l) (repack r)



import Control.Exception (evaluate)
import Criterion.Main

main :: IO ()
main = do
    evaluate divideByTenIntMap
    evaluate divideByTenMap
    evaluate divideByTenCharMap
    evaluate divideByTenArray
    evaluate divideByTenArrayUnboxed
    evaluate divideByTenFullArray
    evaluate divideByTenFullArrayUnboxed

    defaultMainWith defaultConfig
      [ bench "divideByTenIntMap"           $ nf divideByTenIntMapLookup 99
      , bench "divideByTenShortcut"         $ nf divideByTenShortcut 99
      , bench "divideByTenMap"              $ nf divideByTenMapLookup 99
      , bench "divideByTenCharMap"          $ nf divideByTenCharMapLookup 99

      , bench "divideByTenArray"            $ nf divideByTenArrayLookup 99
      , bench "divideByTenArrayUnboxed"     $ nf divideByTenArrayUnboxedLookup 99
      , bench "divideByTenFullArray"        $ nf divideByTenFullArrayLookup 99
      , bench "divideByTenFullArrayUnboxed" $ nf divideByTenFullArrayUnboxedLookup 99
      ]

Build profile: -w ghc-8.10.7 -O1

benchmarking divideByTenIntMap
time                 21.31 ns   (21.26 ns .. 21.36 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 21.31 ns   (21.27 ns .. 21.36 ns)
std dev              137.3 ps   (104.7 ps .. 190.5 ps)

benchmarking divideByTenShortcut
time                 6.881 ns   (6.868 ns .. 6.893 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.896 ns   (6.873 ns .. 6.947 ns)
std dev              112.5 ps   (40.52 ps .. 184.2 ps)
variance introduced by outliers: 23% (moderately inflated)

benchmarking divideByTenMap
time                 23.31 ns   (22.94 ns .. 23.68 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 23.02 ns   (22.91 ns .. 23.26 ns)
std dev              500.1 ps   (245.4 ps .. 865.6 ps)
variance introduced by outliers: 33% (moderately inflated)

benchmarking divideByTenCharMap
time                 15.89 ns   (15.88 ns .. 15.92 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 15.90 ns   (15.88 ns .. 15.92 ns)
std dev              68.89 ps   (44.90 ps .. 113.1 ps)

benchmarking divideByTenArray
time                 27.07 ns   (27.01 ns .. 27.13 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 27.03 ns   (26.92 ns .. 27.13 ns)
std dev              334.5 ps   (248.0 ps .. 474.7 ps)
variance introduced by outliers: 14% (moderately inflated)

benchmarking divideByTenArrayUnboxed
time                 22.89 ns   (22.82 ns .. 22.99 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 22.94 ns   (22.84 ns .. 23.10 ns)
std dev              418.6 ps   (301.6 ps .. 555.1 ps)
variance introduced by outliers: 26% (moderately inflated)

benchmarking divideByTenFullArray
time                 6.639 ns   (6.626 ns .. 6.651 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.635 ns   (6.626 ns .. 6.644 ns)
std dev              28.87 ps   (24.14 ps .. 34.83 ps)

benchmarking divideByTenFullArrayUnboxed
time                 6.029 ns   (6.017 ns .. 6.041 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.021 ns   (6.013 ns .. 6.029 ns)
std dev              26.09 ps   (21.68 ps .. 31.51 ps)

Benchmark doclayout-bench: FINISH

因此，问题似乎并不是与IntMap本身有关。检查-ddump-simpl的输出还显示值似乎被适当地取消装箱。

- Stephen Morgan

2

这里可能发生很多事情。例如，divideByTenFaster 由于分支预测和缓存上的数据需求较少，因此具有更好的性能特征，对吧？无论如何，我建议您分享完整的基准代码，这样其他人就不必重复制作，而且更倾向于尝试一下。 - Thomas M. DuBuisson

4

除非你正在学习二叉搜索树，否则“二叉搜索树”不应该成为要求。一个快速的映射（"fast map"）可能是一个要求。使用已排序数组实现的映射将始终比使用BST实现的映射具有更好的搜索性能。它会给你较差的更新性能，但既然你不需要更新，那就没什么可说的了。 - n. m.

1

一个简单的哈希映射（使用哈希= id）仍然具有更好的性能。考虑到Unicode并不是非常大，只有2^21个代码点，因此保持一个2^21长的数组并不是不合理的，即使只有其中的几个百分比会被使用。 - n. m.

2

你可以进行一些实验，亲自验证。我建议尝试使用未装箱元素的数组，其中每个元素都是未装箱的键加上可能装箱的值。解引用比比较更昂贵。我预计至少会有2倍到3倍的差异。 - n. m.

1

IntMap不是二叉搜索树，而是PATRICIA树（一种Trie树）。在类似于Int的情况下，它通常比Map表现更好，但这并不完全保证。如果你想尝试使用二叉搜索树，可以从Data.Map.Strict和Data.Map.Internal中复制/粘贴所需的代码，但将k字段替换为Char字段，将v字段替换为严格的Int字段。这些将自动取消装箱，消除间接引用并使结构更加紧凑。 - dfeuer

显示剩余6条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Kiko Fernandez · Accepted Answer

你可能想要查看这个包：hashtables。它实现了多个可变哈希表，并且从描述中可以看出，你可以使用一些cabal标志来尝试加速，例如sse42以加速用于cuckoo哈希的高速缓存线搜索。