简化 Haskell 函数

Question

简化 Haskell 函数

8

我目前在学习Haskell时很困难。

我花了近6个小时来编写一个符合要求的函数。但不幸的是，我对它的外观并不满意。

请问有没有人能给我一些提示如何重写它？

get_connected_area :: Eq generic_type => [[generic_type]] -> (Int, Int) -> [(Int,Int)] -> generic_type -> [(Int,Int)]
get_connected_area habitat point area nullValue
  | elem point area = area
  | not ((fst point) >= 0) = area
  | not ((snd point) >= 0) = area
  | not ((fst point) < (length habitat)) = area
  | not ((snd point) < (length (habitat!!0))) = area
  | (((habitat!!(fst point))!!(snd point))) == nullValue = area
  | otherwise = 
    let new_area = point : area
    in 
    get_connected_area habitat (fst point+1, snd point) (
        get_connected_area habitat (fst point-1, snd point) (
            get_connected_area habitat (fst point, snd point+1) (
                get_connected_area habitat (fst point, snd point-1) new_area nullValue
                ) nullValue
            ) nullValue
    ) nullValue

该函数接收一个通用类型的参数（表示一个地形图），并搜索与给定nullValue不相等的点周围的完全连接区域。

例如：

如果以以下方式调用该函数： get_connected_area [[0,1,0],[1,1,1],[0,1,0],[1,0,0]] (1,1) [] 0 这实际上意味着 0 1 0
1 1 1
0 1 0
1 0 0
表示一个地图（类似于谷歌地图）。从坐标(1,1)开始，我想获取所有与给定点形成连接区域的元素的坐标。

因此，结果应该是:

0 1 0
1 1 1
0 1 0
1 0 0
以及相应的返回值（粗体1的坐标列表）: [(2,1),(0,1),(1,2),(1,0),(1,1)]

- infotoni91

3

请开始用文字解释这个函数的作用。 - Code-Apprentice

@Code-Apprentice，我希望现在我的意图更加明显了。 - infotoni91

5

由于您已经拥有可运行的代码，您的问题更适合在[codereview.se]上发表。那里可能会提供更多提示，因为这正是他们期望的问题类型。 - Code-Apprentice

@Code-Apprentice 不知道这个。谢谢 ;) - infotoni91

我刚刚看到另一个简化。请看我最近回答中的最后一句话。 - Code-Apprentice

4个回答

5

这将是我第一次快速通过函数，并且是可以通过代码审查的最低标准（只考虑样式方面）：

getConnectedArea :: Eq a => [[a]] -> a -> (Int, Int) -> [(Int,Int)] -> [(Int,Int)]
getConnectedArea habitat nullValue = go where
  go point@(x,y) area
      | elem point area = area
      | x < 0 = area
      | y < 0 = area
      | x >= length habitat = area
      | y >= length (habitat!!0) = area
      | ((habitat!!x)!!y) == nullValue = area
      | otherwise = 
          foldr go (point : area) 
            [ (x+1, y), (x-1, y), (x, y+1), (x, y-1) ]

我们在顶级绑定habitat和nullValue（澄清递归工作的内容），消除谓词中的间接性，使用驼峰命名法（下划线会让函数应用位置不太明确），将generic_type替换为a（这里使用一个嘈杂的变量实际上与您预期的相反效果;我最终会尝试弄清楚您想要调用的特殊语义是什么，而有趣的是类型并不重要（只要它可以进行等式比较））。

此时我们可以做很多事情：

- 假装我们正在编写真正的代码，并担心将列表视为数组（!!和length）和集合（elem）的渐近性，并使用适当的数组和设置数据结构。 - 将您的边界检查（以及可能的空值检查）移动到新的查找函数中（目标是尽可能只有一个... = area子句） - 考虑算法改进：我们是否可以通过算法避免递归地检查我们刚刚来自的单元格？我们是否可以完全避免传递区域（使我们的搜索非常懒惰/“高产”）？

- jberryman

2

这是我的看法：

import qualified Data.Set as Set

type Point = (Int, Int)

getConnectedArea :: (Point -> Bool) -> Point -> Set.Set Point
getConnectedArea habitat = \p -> worker p Set.empty  
          -- \p is to the right of = to keep it out of the scope of the where clause
    where
    worker p seen
      | p `Set.member` seen = seen
      | habitat p = foldr worker (Set.insert p seen) (neighbors p)
      | otherwise = seen

    neighbors (x,y) = [(x-1,y), (x+1,y), (x,y-1), (x,y+1)]

我已经完成的工作

按照一些评论者的建议，对邻居进行了foldr操作。
由于点的顺序并不重要，我使用了一个Set而不是列表，因此在语义上更加合适且速度更快。
命名了一些有用的中间抽象，例如Point和neighbors。
对于栖息地来说，更好的数据结构也很重要，因为列表是线性时间访问的，也许可以使用2D的Data.Array——但就这个函数而言，你只需要一个索引函数Point -> Bool（越界和空值被视为相同），因此我用索引函数本身替换了数据结构参数（这是FP中常见的转换）。

我们可以看到，还可以将neighbors函数抽象出来，然后我们就会得到一个非常通用的图遍历方法。

traverseGraph :: (Ord a) => (a -> [a]) -> a -> Set.Set a

就这个函数，你可以编写getConnectedArea。我建议出于教学目的这样做-留作练习。

编辑

以下是一个调用该函数（几乎）与旧函数相同的示例：

import Control.Monad ((<=<))

-- A couple helpers for indexing lists.

index :: Int -> [a] -> Maybe a
index _ [] = Nothing
index 0 (x:_) = x
index n (_:xs) = index (n-1) xs

index2 :: (Int,Int) -> [[a]] -> Maybe a
index2 (x,y) = index x <=< index y
-- index2 uses Maybe's monadic structure, and I think it's quite pretty. 
-- But if you're not ready for that, you might prefer
index2' (x,y) xss
  | Just xs <- index y xss = index x xs
  | otherwise = Nothing

getConnectedArea' :: (Eq a) => [[a]] -> Point -> a -> [a]
getConnectedArea' habitat point nullValue = Set.toList $ getConnectedArea nonnull point
    where
    nonnull :: Point -> Bool
    nonnull p = case index2 p habitat of
                  Nothing -> False
                  Just x -> x /= nullValue

- luqui

谢谢，虽然我甚至不知道如何称呼你的实现。有关于worker函数的提示吗？我感觉像10年前开始编程时那样，Haskell和FP对我来说是全新的...抱歉。 - infotoni91

@infotoni91 好的，我给你举了一个例子。没错，就是这样，我也经历过同样的事情。你会在适当的时间掌握它的。 - luqui

1

我总是对 Set 的性能有些谨慎。对于 Set (Int, Int) 和 IntMap IntSet 进行基准测试可能会很有趣，但这当然需要一个大而真实的测试用例。 - dfeuer

1

还有一件事：对于有限的 xs，foldr c n xs = foldl (flip c) n (reverse xs)。这表明您应该使用 foldl'，交换 worker 的参数顺序，并在必要时手动反转 neighbors 的定义中的列表。 - dfeuer

1

好的，我会尝试简化你的代码。然而已经有了很好的答案，所以我将用略微更概念化的方法来解决这个问题。

我认为你可以选择更好的数据类型。例如，Data.Matrix 似乎提供了一个理想的数据类型，可以代替你的 [[generic_type]] 类型。对于坐标，我不会选择元组类型，因为元组类型是用来打包不同类型的。当它被选择为坐标系时，其函子和单子实例并不是非常有帮助。但由于看起来 Data.Matrix 只是使用元组作为坐标，我将保留它们。

好的，你重新表述后的代码如下：

import Data.Matrix

gca :: Matrix Int -> (Int, Int) -> Int -> [(Int,Int)]
gca fld crd nil = let nbs = [id, subtract 1, (+1)] >>= \f -> [id, subtract 1, (+1)]
                                                    >>= \g -> return (f,g)
                                                     >>= \(f,g) -> return ((f . fst) crd, (g . snd) crd)
                  in filter (\(x,y) -> fld ! (x,y) /= nil) nbs

*Main> gca (fromLists [[0,1,0],[1,1,1],[0,1,0],[1,0,0]]) (2,2) 0
[(2,2),(2,1),(2,3),(1,2),(3,2)]

首先要注意的是，Matrix数据类型是基于索引1的。所以我们的中心点是在(2,2)。

其次...我们定义了一个由三个元素函数列表[id, subtract 1, (+1)]。这些包含的函数都是Num a => a -> a类型，我需要它们来定义给定坐标周围的像素，包括给定坐标。所以我们有一条线就像我们做的一样； [1,2,3] >>= \x -> [1,2,3] >>= \y -> return [x,y]会得到[[1,1],[1,2],[1,3],[2,1],[2,2],[2,3],[3,1],[3,2],[3,3]]，在我们的情况下，将产生2种所有函数组合的组合，取代数字1、2和3的位置。

然后我们按照级联指令逐个应用于给定的坐标。

>>= \[f,g] -> return ((f . fst) crd, (g . snd) crd)

这将产生所有相邻的坐标。

然后只需通过检查邻居过滤器是否不等于矩阵中的nil值来过滤邻居过滤器。

- Redu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Code-Apprentice · Accepted Answer

一个小的变化是，你可以在变量 point 中使用模式匹配。这意味着你可以在函数声明中使用 (x, y) 而不是 point：

get_connected_area habitat (x, y) area nullValue = ...

现在，无论您在哪里有fst point，只需放置x，并且无论您在哪里有snd point，都放置y。

另一个修改方法是为子表达式使用更多变量。这可以帮助嵌套递归调用。例如，为最内层的嵌套调用创建一个变量：

....
where foo = get_connected_area habitat (x, y-1) new_area nullValue

现在只需将调用替换为foo。这种技术现在可以用于“新”的最内部调用。(请注意，您应该选择比foo更具描述性的名称。也许是down?)

请注意，not (x >= y)与x < y相同。使用它来简化所有条件。由于这些条件测试点是否在边界矩形内，因此大多数逻辑都可以转移到函数isIn :: (Int, Int) -> (Int, Int) -> (Int, Int) -> Bool中，这将使get_connected_area更易读。