如何在 Haskell 中表示正则表达式模式?

4
我正在尝试使用以下代码进行正则表达式替换
import Text.RE.Replace
import Text.RE.TDFA.String

onlyLetters :: String -> String
onlyLetters s = replaceAll "" $ s *=~ [re|$([^a-zA-Z])|]

我发现很难找到关于这个的易懂文档。 这会产生编译错误:
    src\Pangram.hs:6:53: error: parse error on input `]'
  |
6 | onlyLetters s = replaceAll "" $ (s *=~ [re|[a-zA-Z]|])
  |                                                     ^

Progress 1/2

--  While building package pangram-2.0.0.12 (scroll up to its section to see the error) using:
      C:\sr\setup-exe-cache\x86_64-windows\Cabal-simple_Z6RU0evB_3.0.1.0_ghc-8.8.4.exe --builddir=.stack-work\dist\29cc6475 build lib:pangram test:test --ghc-options " -fdiagnostics-color=always"
    Process exited with code: ExitFailure 1
PS C:\Users\mcleg\Exercism\haskell\pangram> stack test
pangram> configure (lib + test)
Configuring pangram-2.0.0.12...
pangram> build (lib + test)
Preprocessing library for pangram-2.0.0.12..
Building library for pangram-2.0.0.12..
[1 of 2] Compiling Pangram

src\Pangram.hs:7:56: error: parse error on input `]'
  |
7 | onlyLetters s = replaceAll "" $ s *=~ [re|$([^a-zA-Z])|]
  |                                                        ^

Progress 1/2

--  While building package pangram-2.0.0.12 (scroll up to its section to see the error) using:
      C:\sr\setup-exe-cache\x86_64-windows\Cabal-simple_Z6RU0evB_3.0.1.0_ghc-8.8.4.exe --builddir=.stack-work\dist\29cc6475 build lib:pangram test:test --ghc-options " -fdiagnostics-color=always"
    Process exited with code: ExitFailure 1

那个括号有什么问题,我应该如何正确地操作呢? 谢谢。 -Skye


你启用了 QuesiQuoters 扩展吗? - Willem Van Onsem
请查看 使用 Haskell 正则表达式库进行替换 - Wiktor Stribiżew
2个回答

4
[…|…|]准引语 语法 [haskell-wiki]。这是 Haskell 语法的扩展,不会默认启用。
您可以使用 LANGUAGE pragma 启用该功能:
{-# <b>LANGUAGE QuasiQuotes</b> #-}

import Text.RE.Replace
import Text.RE.TDFA.String

onlyLetters :: String -> String
onlyLetters s = replaceAll "" $ s *=~ [re|$([^a-zA-Z])|]

准引用将生成Haskell代码,然后在Haskell程序中使用。这意味着通过准引用可以在编译时验证正则表达式,并且甚至可以稍微优化效率,与运行时编译的正则表达式相比。

对于给定的onlyLetters函数,我们得到:

*Main> onlyLetters "fo0b4r"
"fobr"

3

Willem Van Onsem的回答是更好的答案,但我要建议一个“尝试这个”的答案。

这是如何在纯Haskell中进行文本替换而不涉及准引用正则表达式的复杂性。

使用https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEdit

{-# LANGUAGE TypeFamilies #-}

import Text.Megaparsec
import Text.Megaparsec.Char
import Replace.Megaparsec
import Data.Void

-- | Invert a single-token parser “character class”.
-- | For example, match any single token except a letter or whitespace: `anySingleExcept (letterChar <|> spaceChar)`
anySingleExcept :: (MonadParsec e s m, Token s ~ Char) => m (Token s) -> m (Token s)
anySingleExcept p = notFollowedBy p *> anySingle

-- | A parser monad pattern which matches anything except letters.
nonLetters :: Parsec Void String String
nonLetters = many (anySingleExcept letterChar) 

onlyLetters :: String -> String
onlyLetters = streamEdit nonLetters (const "")

onlyLetters "fo0b4r"

"fobr"

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接