使用多行正则表达式获取grep上下文

Question

使用多行正则表达式获取grep上下文

3

grep的-Pz和-C选项可以同时使用吗？我想匹配相邻行的短语并打印它的上下文。扩展正则表达式和上下文选项可以单独使用，但像这样一起使用不能正常工作（会打印整个文件）：

grep -C 2 -Pz ".*word.*\n.*phrase.*" file.txt

文件.txt的内容：

line 1
line 2
line 3
line 4
line 5
word ...other text
phrase ...yet another text
line 6
line 7
line 8
line 9
line 10

预期结果：

line 4
line 5
word ...other text
phrase ...yet another text
line 6
line 7

- Ondrej Sotolar

1

提供从file.txt文件中的样本输入 - Alireza

提供的文本文件 - Ondrej Sotolar

看起来它能工作，你期望的结果是什么？ - Alireza

你的行是否以空字符结尾？你有 -z，所以你的“行”必须以 \0 结尾。 - dawg

1

扩展正则表达式和上下文选项分别工作。 - anubhava

显示剩余2条评论

2个回答

2

很可能你的问题在于GNU grep中的z标志，该标志将行的定义更改为以\0结尾。

很容易演示。例如：

$ echo "$txt"
line 1
line 2
line 3
line 4
line 5
word ...other text
phrase ...yet another text
line 6
line 7
line 8
line 9
line 10

您可以做以下事情：

$ echo "$txt"  | ggrep --context=2  -Pz "word|phrase"
# prints all the lines

或者：

$ echo "$txt"  | ggrep --context=2  -P "word|phrase"
line 4
line 5
word ...other text
phrase ...yet another text
line 6
line 7

您可以通过实际在行末添加NUL终止符来证明它可以与z一起使用。

$ echo "$txt" | tr '\n' '\0' | ggrep --context=2  -Pz "word|phrase" | tr '\0' '\n'
line 4
line 5
word ...other text
phrase ...yet another text
line 6
line 7

对于Perl正则表达式和before、after以及多行逻辑，您最好使用Perl！

给定：

$ cat file
line 1
line 2
line 3
line 4
line 5
word ...other text
betweener 1, line 7
betweener 2, line 8
phrase ...yet another text
line 10
line 11
line 12
line 13
line 14

您可以做以下事情：

# $b=2 is equivalent to grep -B 2, or lines before
# $a=2 is equivalent to grep -A 2, or lines after
$ perl -lne 'BEGIN{$b=2; $a=2;}
             print join("\n", @a) if (/word/);
             print if (/word/../phrase/) || ($c && $c--);
             $c=$b if (/phrase/);
             shift @a if push(@a, $_)>$a;' file

或者，您也可以使用 POSIX 或 GNU awk：

$ awk 'BEGIN{b=2; a=2}
   /word/ { for (i=FNR-b;i<=FNR-1;i++) 
                 print arr[i]   # print the lines before the first match
            f=1}                # flag we are in the match
    f || (c && c--)             # print either if in the match or tail context
    /phrase/ {f=0; c=a}          # end match, start tail
    {for (ln in arr) 
         if (ln<FNR-b) delete arr[ln] # rolling line buffer
    arr[FNR]=$0}                # save current line
' file

要么打印：

line 4
line 5
word ...other text
betweener 1, line 7
betweener 2, line 8
phrase ...yet another text
line 10
line 11

即使没有“中间”行也可以工作。

- dawg

我不熟悉ggrep，但谢谢，我会去了解一下。 - Ondrej Sotolar

ggrep只是在我的系统（Mac OS / BSD）上的GNU grep。如果您使用的是Linux-它也是您的grep！ - dawg

那么我认为正则表达式需要改变。如果我没记错的话，正则表达式中的管道符号表示“或”，而这不是我所需要的，因为它会匹配任何一个关键词（我需要连续的行）。我在将 \0 放入正则表达式时遇到了麻烦：(\0, \x0, \x00) 都不起作用。 - Ondrej Sotolar

我只是使用"word|phrase"来演示z不能与以\n结尾的行一起使用。对于一个实际可用的正则表达式——这可能是一个新问题。 - dawg

@OndrejSotolar：我更新了一个适合你的 Perl 版本。 - dawg

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lenna · Accepted Answer

不行，不被允许。-Pz和-C互相不兼容。别担心，有一种方法可以实现你想要做的事情：

grep -Pzo ".*\n.*\n.*.*word.*\n.*phrase.*\n.*\n.*" file.txt

或者您可以对其进行参数化

BEFORE=2
AFTER=2
grep -Pzo "(.*\n){$BEFORE}.*word.*\n.*phrase.*(\n.*){$AFTER}" file.txt

使用-Pzo仅打印符合指定模式的行。
在您的模式字符串周围包含一些.*\n.*的填充。

您可能会发现这个Bash函数很有用：

function pad_grep()(
        usage() { echo "Usage: $0 [-ABC] [EXPR] [FILE]" 1>&2; exit 1; }

        A=0
        B=0
        while getopts "A:B:C:" flag; do
                case "$flag" in
                        A) A=$OPTARG;;
                        B) B=$OPTARG;;
                        C) A=$OPTARG;B=$OPTARG;;
                        *) usage;;
                esac
        done
        EXPR=${@:$OPTIND:1}
        FILE=${@:$OPTIND+1:1}

        # Error checking
        [ ${#EXPR} -eq 0 ] && usage
        [[ ${#FILE} -ne 0 && ! -f ${FILE} ]] && usage

        grep -Pzo "(.*\n){$B}${EXPR}(\n.*){$A}" $FILE
)

# Do it yourself
grep -Pzo ".*\n.*\n.*\n.*.*word.*\n.*phrase.*\n.*\n.*" file.txt

# Use the function
pad_grep -B 3 -A 2 '.*word.*\n.*phrase.*' file.txt
pad_grep -C 2 '.*word.*\n.*phrase.*' file.txt