带引号的Bash正则表达式?

85
下面的代码
number=1
if [[ $number =~ [0-9] ]]
then
  echo matched
fi

正则表达式原本是有效的。但是如果我试图在正则表达式中使用引号,它就会停止工作:

works. 如果我尝试在正则表达式中使用引号,它将停止工作:

number=1
if [[ $number =~ "[0-9]" ]]
then
  echo matched
fi

我也尝试了"\[0-9\]",但是还是不行。我错在哪里了?

有趣的是,bash高级脚本指南建议这应该可以正常工作。

Bash版本3.2.39。


6
ABS被广泛认为是一个不准确(或者在更好的情况下只是误导性的)指南来源,可以将其视为shell脚本的W3Schools。考虑一下bash-hackers.org或wooledg wikis作为备选方案,这些网站注重准确性。 - Charles Duffy
5个回答

131

这是在bash-3.1发布以来添加到bash-3.2的新功能的简要描述。如常,手册页(doc/bash.1)是查看完整描述的地方。

  1. Bash中的新功能

snip

f. 引用[[命令]]的字符串参数到=~运算符现在强制进行字符串匹配,就像其他模式匹配运算符一样。

遗憾的是,这将破坏现有的使用引号的脚本,除非您有洞察力将模式存储在变量中并直接使用它们而不是正则表达式。下面是一个示例。

它在3.1和3.2之间被更改了。猜测高级指南需要更新。

$ bash --version
GNU bash, version 3.2.39(1)-release (i486-pc-linux-gnu)
Copyright (C) 2007 Free Software Foundation, Inc.
$ number=2
$ if [[ $number =~ "[0-9]" ]]; then echo match; fi
$ if [[ $number =~ [0-9] ]]; then echo match; fi
match
$ re="[0-9]"
$ if [[ $number =~ $re ]]; then echo MATCH; fi
MATCH

$ bash --version
GNU bash, version 3.00.0(1)-release (i586-suse-linux)
Copyright (C) 2004 Free Software Foundation, Inc.
$ number=2
$ if [[ $number =~ "[0-9]" ]]; then echo match; fi
match
$ if [[ "$number" =~ [0-9] ]]; then echo match; fi
match

28
真有趣。引用的正则表达式不再起作用。带空格的非引用正则表达式也不起作用。基于变量的正则表达式即使包含空格也可以工作。真是一团糟。 - Pavel Šimerda
1
有趣的是,这个代码可以运行:if [[ $number =~ ["0-9"] ]]; then echo match; fi - ingyhere
这真是令人失望,我们需要依赖于“echo”或“compat31”解决方法... - siulkilulki

23

Bash 3.2 引入了一个兼容选项 compat31,将 bash 正则表达式的引用行为恢复到 3.1 版本。

没有启用 compat31:

$ shopt -u compat31
$ shopt compat31
compat31        off
$ set -x
$ if [[ "9" =~ "[0-9]" ]]; then echo match; else echo no match; fi
+ [[ 9 =~ \[0-9] ]]
+ echo no match
no match

使用compat31:

$ shopt -s compat31
+ shopt -s compat31
$ if [[ "9" =~ "[0-9]" ]]; then echo match; else echo no match; fi
+ [[ 9 =~ [0-9] ]]
+ echo match
match

补丁链接: http://ftp.gnu.org/gnu/bash/bash-3.2-patches/bash32-039


9

GNU bash,版本4.2.25(1)-release(x86_64-pc-linux-gnu)

一些字符串匹配和正则表达式匹配的示例

    $ if [[ 234 =~ "[0-9]" ]]; then echo matches;  fi # string match
    $ 

    $ if [[ 234 =~ [0-9] ]]; then echo matches;  fi # regex natch 
    matches


    $ var="[0-9]"

    $ if [[ 234 =~ $var ]]; then echo matches;  fi # regex match
    matches


    $ if [[ 234 =~ "$var" ]]; then echo matches;  fi # string match after substituting $var as [0-9]

    $ if [[ 'rss$var919' =~ "$var" ]]; then echo matches;  fi   # string match after substituting $var as [0-9]

    $ if [[ 'rss$var919' =~ $var ]]; then echo matches;  fi # regex match after substituting $var as [0-9]
    matches


    $ if [[ "rss\$var919" =~ "$var" ]]; then echo matches;  fi # string match won't work

    $ if [[ "rss\\$var919" =~ "$var" ]]; then echo matches;  fi # string match won't work


    $ if [[ "rss'$var'""919" =~ "$var" ]]; then echo matches;  fi # $var is substituted on LHS & RHS and then string match happens 
    matches

    $ if [[ 'rss$var919' =~ "\$var" ]]; then echo matches;  fi # string match !
    matches



    $ if [[ 'rss$var919' =~ "$var" ]]; then echo matches;  fi # string match failed
    $ 

    $ if [[ 'rss$var919' =~ '$var' ]]; then echo matches;  fi # string match
    matches



    $ echo $var
    [0-9]

    $ 

    $ if [[ abc123def =~ "[0-9]" ]]; then echo matches;  fi

    $ if [[ abc123def =~ [0-9] ]]; then echo matches;  fi
    matches

    $ if [[ 'rss$var919' =~ '$var' ]]; then echo matches;  fi # string match due to single quotes on RHS $var matches $var
    matches


    $ if [[ 'rss$var919' =~ $var ]]; then echo matches;  fi # Regex match 
    matches
    $ if [[ 'rss$var' =~ $var ]]; then echo matches;  fi # Above e.g. really is regex match and not string match
    $


    $ if [[ 'rss$var919[0-9]' =~ "$var" ]]; then echo matches;  fi # string match RHS substituted and then matched
    matches

    $ if [[ 'rss$var919' =~ "'$var'" ]]; then echo matches;  fi # trying to string match '$var' fails


    $ if [[ '$var' =~ "'$var'" ]]; then echo matches;  fi # string match still fails as single quotes are omitted on RHS 

    $ if [[ \'$var\' =~ "'$var'" ]]; then echo matches;  fi # this string match works as single quotes are included now on RHS
    matches

6

如其他答案所述,将正则表达式放入变量中是实现在不同版本上兼容性的一般方式。您还可以使用此解决方法,在条件表达式内保持正则表达式的同时实现相同的功能:

$ number=1
$ if [[ $number =~ $(echo "[0-9]") ]]; then echo matched; fi
matched
$ 

使用命令替换会导致一定的性能损失,在某些情况下可能会非常显著(例如,在循环中执行大量检查)。 - Near Privman

2

使用本地变量比使用命令替换略具有更好的性能。

对于较大的脚本或脚本集合,使用实用程序可以防止不需要的本地变量污染代码,并减少冗长。这似乎很有效:

# Bash's built-in regular expression matching requires the regular expression
# to be unqouted (see https://dev59.com/QnVC5IYBdhLWcg3wqzLV), which makes it harder
# to use some special characters, e.g., the dollar sign.
# This wrapper works around the issue by using a local variable, which means the
# quotes are not passed on to the regex engine.
regex_match() {
  local string regex
  string="${1?}"
  regex="${2?}"
  # shellcheck disable=SC2046 `regex` is deliberately unquoted, see above.
  [[ "${string}" =~ ${regex} ]]
}

使用示例:

if regex_match "${number}" '[0-9]'; then
  echo matched
fi

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接