在包含A、B、C...但不包含Z的文件中使用grep正则表达式

Question

在包含A、B、C...但不包含Z的文件中使用grep正则表达式

4

花了几个小时尝试自己回答这个问题，使用这个问题的部分答案；所以如果已经回答过了，我很抱歉，但是将我能找到的部分解决方案组合起来以正确执行此搜索似乎超出了我的能力范围。

我要做的事情：在目录中搜索包含多个唯一字符串（任意顺序，任何位置）的文件，但不包含文件中的另一个特定字符串。

以下是我迄今为止的搜索：

pcregrep -riM '^(?=.*uniquestringA)(?=.*uniquestringB)(?=.*uniquestringC)(?=.*uniquestringD)(?=.*uniquestringE).*$' . 
| xargs grep -Li 'uniquestringZ'

我意识到这是非常、非常错误的，因为我似乎甚至不能让多行搜索在忽略字符串出现顺序时正常工作。

非常感谢任何帮助。

- Asterdahl

2个回答

1

虽然需要进行大量的grep调用，但你可以用简单且符合POSIX标准的方式，使用find和grep将其写出：

find . -type f \
  -exec grep -q "stringA" {} \; \
  -exec grep -q "stringB" {} \; \
  -exec grep -q "stringC" {} \; \
  -exec grep -q "stringD" {} \; \
  ! -exec grep -q "stringZ" {} \; \
  -print  # or whatever to do with matches

- that other guy

有点慢，但我没想到会找到一个超快的解决方案，它运行得非常好。 - Asterdahl

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- dawg · Accepted Answer

如果你的grep拥有lookaheads，你应该可以这样做：

^(?!.*Z)(?=.*A)(?=.*B)(?=.*C)(.*)$

点击查看实例

有了这个文件：

$ cat /tmp/grep_tgt.txt
A,B,C      # should match
A,B,C,D    # should match
A,C,D      # no match, lacking upper b
A,B,C,Z    # no match, has upper z

您可以使用Perl一行命令：

$ perl -ne 'print if /^(?!.*Z)(?=.*A)(?=.*B)(?=.*C)(.*)$/' /tmp/grep_tgt.txt
A,B,C      # should match
A,B,C,D    # should match

带有文件名：

$ find . -type f
./.DS_Store
./A-B-C
./A-B-C-Z
./A-C-D
./sub/A-B-C-D

您可以使用perl过滤文件名：

$ find . -type f | perl -ne 'print if /^(?!.*Z)(?=.*A)(?=.*B)(?=.*C)(.*)$/'
./A-B-C
./sub/A-B-C-D

如果您想读取文件内容以测试模式（如grep），则可以执行以下操作：

$ find . -type f | xargs perl -ne 'print "$ARGV: $&\n" if /^
(?!.*Z)(?=.*A)(?=.*B)(?=.*C)(.*)$/'
./1.txt: A B C     # should match
./2.txt: A,B,C,D    # should match

我将四个文件放在一个目录中（1.txt..4.txt），其中1.txt和2.txt的文本相匹配。