Haskell字节串：如何进行模式匹配？

Question

Haskell字节串：如何进行模式匹配？

haskellpattern-matchingbytestringpattern-synonyms

28

我是Haskell的新手，对于如何匹配ByteString有一些困惑。我的函数的[Char]版本如下：

dropAB :: String -> String
dropAB []       = []
dropAB (x:[])   = x:[]
dropAB (x:y:xs) = if x=='a' && y=='b'
                  then dropAB xs
                  else x:(dropAB $ y:xs)

作为预期结果，这将从字符串中过滤掉所有的“ab”出现。然而，我尝试将其应用于ByteString时遇到了问题。

天真的版本

dropR :: BS.ByteString -> BS.ByteString
dropR []         = []
dropR (x:[])     = [x]
<...>

产量

Couldn't match expected type `BS.ByteString'
       against inferred type `[a]'
In the pattern: []
In the definition of `dropR': dropR [] = []

[] 明显是罪魁祸首，因为它是用于常规的 String 而不是 ByteString。替换为 BS.empty 似乎是正确的选择，但会出现 "Qualified name in the binding position: BS.empty." 的错误提示。所以我们尝试使用

dropR :: BS.ByteString -> BS.ByteString
dropR empty              = empty        
dropR (x cons empty)     = x cons empty
<...>

这会导致(x cons empty)出现“模式解析错误”。我不知道还能做什么。另外，我想用这个函数过滤文本中的特定UTF16字符。如果有一种简单的方法来实现这个目标，我很乐意听取建议，但这个模式匹配错误似乎是一个新手应该真正理解的问题。

- LOS

我不确定，但是也许应该使用守卫而不是模式匹配？ - li.davidm

1

你不能过滤掉一个UTF-16字符。也许你的意思是“过滤掉一个以UTF-16编码的文本中的字符”。 - gawi

为了记录，因为原帖提出来了：考虑一下你是否真的需要像处理链表一样处理你的 ByteString。在这种情况下，decodeUtf16LE 和 Text.filter (/= '消') 是更干净的高层工具，可以实现 LOS 所尝试做的事情。不管是什么情况让你看到了这个问题，都可能有类似的解决方案！ - Lynn

5个回答

14

最新版本的GHC（7.8）具有称为模式同义词的功能，可以添加到gawi的示例中：

{-# LANGUAGE ViewPatterns, PatternSynonyms #-}

import Data.ByteString (ByteString, cons, uncons, singleton, empty)
import Data.ByteString.Internal (c2w)

infixr 5 :<

pattern b :< bs <- (uncons -> Just (b, bs))
pattern Empty   <- (uncons -> Nothing)

dropR :: ByteString -> ByteString
dropR Empty          = empty
dropR (x :< Empty)   = singleton x
dropR (x :< y :< xs)
  | x == c2w 'a' && y == c2w 'b' = dropR xs
  | otherwise                    = cons x (dropR (cons y xs))

更进一步，您可以将其抽象为适用于任何类型类（如果实现了关联模式同义词，这将看起来更好）。模式定义保持不变：

{-# LANGUAGE ViewPatterns, PatternSynonyms, TypeFamilies #-}

import qualified Data.ByteString as BS
import Data.ByteString (ByteString, singleton)
import Data.ByteString.Internal (c2w)
import Data.Word

class ListLike l where
  type Elem l

  empty  :: l
  uncons :: l -> Maybe (Elem l, l)
  cons   :: Elem l -> l -> l

instance ListLike ByteString where
  type Elem ByteString = Word8

  empty  = BS.empty
  uncons = BS.uncons
  cons   = BS.cons

instance ListLike [a] where
  type Elem [a] = a

  empty         = []
  uncons []     = Nothing
  uncons (x:xs) = Just (x, xs)
  cons          = (:)

在这种情况下，dropR 可以同时作用于 [Word8] 和 ByteString：

-- dropR :: [Word8]    -> [Word8]
-- dropR :: ByteString -> ByteString
dropR :: (ListLike l, Elem l ~ Word8) => l -> l
dropR Empty          = empty
dropR (x :< Empty)   = cons x empty
dropR (x :< y :< xs)
  | x == c2w 'a' && y == c2w 'b' = dropR xs
  | otherwise                    = cons x (dropR (cons y xs))

为了好玩儿：

import Data.ByteString.Internal (w2c)

infixr 5 :•    
pattern b :• bs <- (w2c -> b) :< bs

dropR :: (ListLike l, Elem l ~ Word8) => l -> l
dropR Empty              = empty
dropR (x   :< Empty)     = cons x empty
dropR ('a' :• 'b' :• xs) = dropR xs
dropR (x   :< y   :< xs) = cons x (dropR (cons y xs))

您可以在我关于模式同义词的文章中了解更多信息。

- Iceland_jack

10

模式使用数据构造函数。http://book.realworldhaskell.org/read/defining-types-streamlining-functions.html

你的empty只是第一个参数的绑定，它可以是x，并且不会改变任何东西。

您无法在模式中引用普通函数，因此(x cons empty)不合法。注意：我猜您实际想要的是(cons x empty)，但这也是非法的。

ByteString与String非常不同。String是[Char]的别名，因此它是一个真正的列表，可以在模式中使用:运算符。

ByteString是Data.ByteString.Internal.PS！（GHC.ForeignPtr.ForeignPtr GHC.Word.Word8）！Int！Int（即指向本地char* +偏移量+长度的指针）。由于ByteString的数据构造函数被隐藏，因此必须使用函数来访问数据，而不能使用模式。

这里有一个解决方案（当然不是最好的），可以使用text包来解决您的UTF-16过滤器问题：

module Test where

import Data.ByteString as BS
import Data.Text as T
import Data.Text.IO as TIO
import Data.Text.Encoding

removeAll :: Char -> Text -> Text
removeAll c t =  T.filter (/= c) t

main = do
  bytes <- BS.readFile "test.txt"
  TIO.putStr $ removeAll 'c' (decodeUtf16LE bytes)

- gawi

不知道关于模式和数据构造函数的那一点。由于如下所述，ByteString 不导出其构造函数，现在这很有意义。感谢所有回答的人。 - LOS

6

为此，我会对 uncons :: ByteString -> Maybe (Word8, ByteString) 的结果进行模式匹配。

Haskell中的模式匹配只适用于使用'data'或'newtype'声明的构造函数。ByteString类型不导出其构造函数，因此您无法进行模式匹配。

- Antoine Latter

2

为了解决您收到的错误消息及其含义：

Couldn't match expected type `BS.ByteString'
       against inferred type `[a]'
In the pattern: []
In the definition of `dropR': dropR [] = []

因此，编译器在你的函数签名中指定了类型为：BS.ByteString -> BS.ByteString，因此编译器期望你的函数是这个类型。然而，它通过查看函数体推断出该函数实际上是[a] -> [a]类型。由于存在不匹配，因此编译器会报错。

问题在于你认为（：）和[]是语法糖，但它们实际上只是列表类型的构造函数（与ByteString非常不同）。

- jberryman

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ed'ka · Accepted Answer

你可以使用视图模式来处理这样的事情。

{-# LANGUAGE ViewPatterns #-}    
import Data.ByteString (ByteString, cons, uncons, singleton, empty)
import Data.ByteString.Internal (c2w) 

dropR :: ByteString -> ByteString
dropR (uncons -> Nothing) = empty
dropR (uncons -> Just (x,uncons -> Nothing)) = singleton x
dropR (uncons -> Just (x,uncons -> Just(y,xs))) =
    if x == c2w 'a' && y == c2w 'b'
    then dropR xs
    else cons x (dropR $ cons y xs)