不区分大小写的正则表达式

Question

不区分大小写的正则表达式

8

什么是在Haskell中使用正则表达式选项（标志）的最佳方法？

我使用的是

Text.Regex.PCRE

文档列出了一些有趣的选项，如compCaseless，compUTF8等。但我不知道如何在(=~)中使用它们。

- Gaetan Dubar

3个回答

11

我对Haskell一无所知，但如果你正在使用基于PCRE的正则表达式库，那么你可以在正则表达式中使用模式修改器。要以不区分大小写的方式匹配“caseless”，可以在PCRE中使用此正则表达式：

(?i)caseless

(?i)模式修饰符覆盖了正则表达式之外设置的大小写敏感或不敏感选项。此外，它还可以与不允许您设置任何选项的操作符一起使用。

类似地，(?s) 打开“单行模式”，使点匹配换行符，(?m) 打开“多行模式”，使 ^ 和 $ 在换行符处匹配，并且 (?x) 打开自由间隔模式（字符类之外未转义的空格和换行符无关紧要）。您可以组合字母。 (?ismx)打开所有选项。连字符关闭选项。 (?-i) 使正则表达式大小写敏感。 (?x-i) 开始一个自由间隔大小写敏感的正则表达式。

- Jan Goyvaerts

1

它也能工作！它比被接受的解决方案简单得多，但也不如后者通用。 - Gaetan Dubar

1

+1 这使我们能够保持惯用的=〜运算符，并将正则表达式定义为String。更加基础！ - recursion.ninja

9

我认为如果你想使用与defaultCompOpt不同的compOpt，就不能使用(=~)。

可以尝试以下方式：

match (makeRegexOpts compCaseless defaultExecOpt  "(Foo)" :: Regex) "foo" :: Bool

以下两篇文章应该能帮到您：

《Real World Haskell》第8章。高效文件处理、正则表达式和文件名匹配

Haskell 正则表达式教程

- davetapley

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ephemient · Accepted Answer

所有Text.Regex.*模块都大量使用类型类，这是为了可扩展性和“重载”行为，但从类型上看使用方法不太明显。

现在，您可能已经从基本的=〜匹配器开始了。

(=~) ::
  ( RegexMaker Regex CompOption ExecOption source
  , RegexContext Regex source1 target )
  => source1 -> source -> target
(=~~) ::
  ( RegexMaker Regex CompOption ExecOption source
  , RegexContext Regex source1 target, Monad m )
  => source1 -> source -> m target

若要使用=~，则左侧必须存在一个RegexMaker ...实例，而右侧和结果需要一个RegexContext ...。

class RegexOptions regex compOpt execOpt | ...
      | regex -> compOpt execOpt
      , compOpt -> regex execOpt
      , execOpt -> regex compOpt
class RegexOptions regex compOpt execOpt
      => RegexMaker regex compOpt execOpt source
         | regex -> compOpt execOpt
         , compOpt -> regex execOpt
         , execOpt -> regex compOpt
  where
    makeRegex :: source -> regex
    makeRegexOpts :: compOpt -> execOpt -> source -> regex

有效的所有这些类的实例（例如，regex=Regex，compOpt=CompOption，execOpt=ExecOption和source=String）意味着可以使用某种形式的source编译具有compOpt,execOpt选项的regex。（此外，对于给定的regex类型，恰好存在一个与之对应的compOpt,execOpt集合。虽然许多不同的source类型都可以使用。）

class Extract source
class Extract source
      => RegexLike regex source
class RegexLike regex source
      => RegexContext regex source target
  where
    match :: regex -> source -> target
    matchM :: Monad m => regex -> source -> m target

一个有效的所有这些类的实例（例如，regex=Regex，source=String，target=Bool）意味着可以匹配一个source和一个regex以产生一个target。（在给定这些特定的regex和source的情况下，其他有效的target包括Int，MatchResult String，MatchArray等。）

将它们组合起来，很明显=〜和=〜〜只是方便函数。

source1 =~ source
  = match (makeRegex source) source1
source1 =~~ source
  = matchM (makeRegex source) source1

而且=~和=~~也不允许传递不同的选项给makeRegexOpts函数。

你可以自己创建一个。

(=~+) ::
   ( RegexMaker regex compOpt execOpt source
   , RegexContext regex source1 target )
   => source1 -> (source, compOpt, execOpt) -> target
source1 =~+ (source, compOpt, execOpt)
  = match (makeRegexOpts compOpt execOpt source) source1
(=~~+) ::
   ( RegexMaker regex compOpt execOpt source
   , RegexContext regex source1 target, Monad m )
   => source1 -> (source, compOpt, execOpt) -> m target
source1 =~~+ (source, compOpt, execOpt)
  = matchM (makeRegexOpts compOpt execOpt source) source1

这可以像下面这样使用

"string" =~+ ("regex", CompCaseless + compUTF8, execBlank) :: Bool

或覆盖=~和=~~，使用可以接受选项的方法。

import Text.Regex.PCRE hiding ((=~), (=~~))

class RegexSourceLike regex source
  where
    makeRegexWith source :: source -> regex
instance RegexMaker regex compOpt execOpt source
         => RegexSourceLike regex source
  where
    makeRegexWith = makeRegex
instance RegexMaker regex compOpt execOpt source
         => RegexSourceLike regex (source, compOpt, execOpt)
  where
    makeRegexWith (source, compOpt, execOpt)
      = makeRegexOpts compOpt execOpt source

source1 =~ source
  = match (makeRegexWith source) source1
source1 =~~ source
  = matchM (makeRegexWith source) source1

或者你可以直接在需要的地方使用 match、makeRegexOpts 等方法。