Haskell如何将Char转换为Word8

Question

Haskell如何将Char转换为Word8

24

我希望能够将ByteString拆分成单词，如下所示：

import qualified Data.ByteString as BS

main = do
    input <- BS.getLine
    let xs = BS.split ' ' input

但是似乎 GHC 不能自己将字符字面量转换为 Word8，所以我得到了以下结果：

Couldn't match expected type `GHC.Word.Word8'
            with actual type `Char'
In the first argument of `BS.split', namely ' '
In the expression: BS.split ' ' input

Hoogle找不到类型签名为Char -> Word8的内容，而Word.Word8 ' '是无效的类型构造函数。有什么解决方法吗？

- Andrew

5

不要使用 ByteString 来处理文本！请改用 Text。 - Daniel Wagner

@DanielWagner 为什么不呢？它比 ByteString 更快吗？ - Andrew

6

Text 支持 Unicode，因此您的字符串将在所有国家都是字符串。ByteString 用于二进制解析、原始内存访问，不能处理除 ASCII 或 Latin1 以外的任何内容。 - Don Stewart

有趣，谢谢。那是一个编程竞赛的问题，所以可能的编码范围仅限于ASCII。 - Andrew

1

你可能想要使用 import qualified Data.ByteString.Char8 as B 代替。 - George Co

5个回答

17

如果你真的需要Data.ByteString（而非Data.ByteString.Char8），你可以像Data.ByteString本身一样将Word8转换为Char：

import qualified Data.ByteString as BS
import qualified Data.ByteString.Internal as BS (c2w, w2c)

main = do
    input <- BS.getLine
    let xs = BS.split (BS.c2w ' ') input 
    return ()

- Grwlf

4

寻找一个简单的 Char -> Word8 函数，使用基础库：

import Data.Word

charToWord8 :: Char -> Word8
charToWord8 = toEnum . fromEnum

- Hussein AIT LAHCEN

我一点头绪都没有，朋友。 - Hussein AIT LAHCEN

2

我想直接回答主题中的问题，这也是我来到这里的原因。

您可以使用fromIntegral.ord将单个Char转换为单个Word8：

λ> import qualified Data.ByteString as BS
λ> import Data.Char(ord)

λ> BS.split (fromIntegral.ord $ 'd') $ BS.pack . map (fromIntegral.ord) $ "abcdef"

["abc","ef"]

请记住，如下所示，这种转换容易发生溢出。您必须确保您的Char适合8位，如果您不希望发生这种情况。"最初的回答"

λ> 260 :: Word8

4

当然，针对你的具体问题，建议使用已经在接受的答案中提到的Data.ByteString.Char8模块。最初的回答。

- oo_miguel

0

另一个可能的解决方案是以下内容：

charToWord8 :: Char -> Word8
charToWord8 = fromIntegral . ord
{-# INLINE charToWord8 #-}

其中 ord :: Chat → Int，其余可以推断。

- Jonathan Prieto-Cubides

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Don Stewart · Accepted Answer

通过Data.ByteString.Char8模块，您可以将字节串中的Word8值视为Char。只需

import qualified Data.ByteString.Char8 as C

那么可以参考例如C.split。在底层，它是相同的bytestring，但是提供了基于Char的函数用于方便的字节/ASCII解析。