使用REPA优化Haskell中的平均图像颜色程序

Question

使用REPA优化Haskell中的平均图像颜色程序

4

问题

我编写了一个Haskell程序，它可以遍历文件夹并找到文件夹中每个图像的平均颜色。它使用来自Hackage的repa-devil包将图像加载到repa数组中。我通过将所有红色、蓝色和绿色值相加，然后除以像素数来计算平均值：

-- compiled with -O2
import qualified Data.Array.Repa as R
import Data.Array.Repa.IO.DevIL
import Control.Monad.Trans (liftIO)
import System.Directory (getDirectoryContents)

size :: (R.Source r e) => R.Array r R.DIM3 e -> (Int, Int)
size img = (w, h)
    where (R.Z R.:. h R.:. w R.:. 3) = R.extent img

averageColour :: (R.Source r e, Num e, Integral e) => R.Array r R.DIM3 e -> (Int, Int, Int)
averageColour img = (r `div` n, g `div` n, b `div` n)
    where (w, h)  = size img
          n       = w * h
          (r,g,b) = f 0 0 0 0 0
          f row col r g b
            | row >= w  = f 0 (col + 1) r g b
            | col >= h  = (r, g, b)
            | otherwise = f (row + 1) col (addCol 0 r) (addCol 1 g) (addCol 2 b)
            where addCol x v = v + fromIntegral (img R.! (R.Z R.:. col R.:. row R.:. x))

main :: IO ()
main = do
    files <- fmap (map ("images/olympics_backup/" ++) . filter (`notElem` ["..", "."])) $ getDirectoryContents "images/olympics_backup"
    runIL $ do
        images <- mapM readImage files
        let average = zip (map (\(RGB img) -> averageColour img) images) files
        liftIO . print $ average

我还使用Python编写了这个程序，使用Python图像库。它以相同的方式找到图像的平均值：

import Image

def get_images(folder):
    images = []
    for filename in os.listdir(folder):
        images.append(folder + filename)
    return images

def get_average(filename):
    image = Image.open(filename)
    pixels = image.load()
    r = g = b = 0
    for x in xrange(0, image.size[0]):
        for y in xrange(0, image.size[1]):
            colour = pixels[x, y]
            r += colour[0]
            g += colour[1]
            b += colour[2]
    area = image.size[0] * image.size[1]
    r /= area
    g /= area
    b /= area
    return [(r, g, b), filename, image]

def get_colours(images):
    colours = []
    for image in images:
        try:
            colours.append(get_average(image))
        except:
            continue
    return colours

imgs = get_images('images/olympics_backup/')
print get_colours(imgs)

当在一个包含301张图片的文件夹上运行两者时，Haskell版本的速度比Python版本慢0.2秒（0.87对0.64）。这似乎很奇怪，因为Haskell是一种编译语言（通常比解释语言更快），而且我听说repa数组具有良好的性能（尽管这可能只是与其他Haskell数据类型（如列表）相比较）。

我尝试过的方法是注意到我正在使用显式递归，所以我决定使用fold来替换它，这也意味着我不再需要检查是否超出了数组的边界:

(r,g,b) = foldl' f (0,0,0) [(x, y) | x <- [0..w-1], y <- [0..h-1]]
f (r,g,b) (row,col) = (addCol 0 r, addCol 1 g, addCol 2 b)
        where addCol x v = v + fromIntegral (img R.! (R.Z R.:. col R.:. row R.:. x))

这使得代码变慢了（需要1.2秒），因此我决定对代码进行性能分析，以确定大部分时间都花费在哪里（是否存在明显的瓶颈或者repa-devil包只是慢）。性能分析告诉我，约58%的时间花费在f函数中，约35%的时间花费在addCol中。

不幸的是，我无法想到任何方法使之运行更快。这个函数只是一个数组索引和一个加法——与Python代码相同。有没有方法可以提高这段代码的性能，或者Python图像库只是提供了更好的性能？

- matio2matio

1

在 (r,g,b) = foldl' f (0,0,0) [(x, y) | x <- [0..w-1], y <- [0..h-1]] 中使用 foldl' 没有任何好处。foldl' 将中间结果计算到弱头正常形式，即这里的 (,,) 构造函数。组件仍然是延迟计算的。我不想安装一个包来检查，所以只能猜测。是否将 f 在 r、g 和 b 上变为严格求值，f row col !r !g !b（当然需要使用 BangPatterns），是否会有帮助？ - Daniel Fischer

2

使用自定义的fold与repa似乎不是一个好主意，转换为列表可能无法解决问题。我敢打赌，如果有人想要检查，使用backpermute将颜色通道作为三个单独的延迟数组，然后调用sumS会获得更好的性能。 - Thomas M. DuBuisson

1

不要将Repa转换为列表，这会严重影响性能，并抵消了从融合或并行化中获得的任何好处。 - Don Stewart

1

@matio2matio 如果涉及使用融合技术的库（如bytestring、text、vector、repa等），请永远不要使用编译为分析而编译的代码进行性能测量。编译为分析会抑制很多优化，对于依赖这些优化的库来说，这是致命的。 - Daniel Fischer

@leventov 75x75（5625像素）。将r、g和b设为严格的值，使得比Python更快。 - matio2matio

显示剩余5条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- leventov · Accepted Answer

虽然以下代码有些hack，但速度相当快。

75x75像素的图像处理时间为0.03毫秒（每像素16个时钟周期），处理300张图像大约需要10-20毫秒。
512x512像素的Lenna图像处理时间为1毫秒（每像素13.5个时钟周期）。
2560x1600像素的图像处理时间为12毫秒（每像素9.2个时钟周期）。

yarr是专门设计用来解决此类任务的，不幸的是，存在一些问题（在代码注释中指出），这些问题不允许同时使代码简洁和快速。

一个像素的例程需要3次内存读取+3个add操作，因此我大致期望此任务的极限为每像素3个时钟周期。

您还可以使用parallel-io包中的parallel轻松并行计算。

{-# LANGUAGE FlexibleContexts, TypeFamilies #-}

import System.Environment

import Data.Yarr
import Data.Yarr.IO.Image
import Data.Yarr.Walk
import Data.Yarr.Utils.FixedVector as V
import Data.Yarr.Shape as S

main :: IO ()
main = do
    [file] <- getArgs
    print =<< getAverage file

getAverage :: FilePath -> IO (Int, Int, Int)
getAverage file = do
    -- Meaningful choice, for homogenious images,
    -- in preference to readRGB(Vectors).
    -- readRGB make the case of representation -> polymorfic access ->
    -- poor performance 
    (RGB imageArr) <- readImage file
    -- let imageArr = readRGBVectors file
    let ext = extent imageArr
    avs <- averageColour imageArr
    return $ V.inspect avs (Fun (,,))


averageColour
    :: (Vector v Int, Dim v ~ N3, Integral e,
        UVecSource r slr l Dim2 v e, PreferredWorkIndex l Dim2 i)
    => UArray r l Dim2 (v e) -> IO (VecList N3 Int)
{-# INLINE averageColour #-}
averageColour image = fmap (V.map (`div` (w * h))) compSums
  where -- `walk (reduce ... (V.zipWith (+))) (return V.zero) image`
        -- would be more idiomatic and theoretically faster,
        -- but had problems with perf too :(
        compSums = walkSlicesSeparate sum (return 0) image
        -- would better to `mapElems fromIntegral imageArr` before counting,
        -- but faced some performance problems and I have no time to dig them
        {-# INLINE sum #-}
        sum = reduceL sumFold (\x y -> x + (fromIntegral y))
        sumFold = S.unrolledFoldl n8 noTouch
        (w, h) = extent image

编译

ghc-7.6.1 --make -Odph -rtsopts -threaded -fno-liberate-case -funbox-strict-fields -funfolding-keeness-factor1000 -fllvm -optlo-O3 -fexpose-all-unfoldings -fsimpl-tick-factor=500 -o avc average-color.hs