正则表达式删除匹配第一个字符串的行？

Question

正则表达式删除匹配第一个字符串的行？

3

我有一长串带有许多类似情况的行，这些行具有相同的首个单词（空格前的第一个字符串），但其余部分不同。我需要仅保留唯一首个字符串的一行。

john jane
john 123
john jim jane
jane john
jane 123
jane 456
jim
jim 1

要得到这个结果：

john jane
jane john
jim

因此，如果一行中的第一个单词匹配，则删除除一行外的所有行。

我可以删除所有重复的行，但是会保留像上面示例中的行。

^(.*)(\r?\n\1)+$

这个正则表达式可以删除重复的行，不像示例中那样。是否有正则表达式或记事本宏来解决这个问题？

- Jim8645

对于Notepad++来说，这不是最佳解决方案：^((\w+\b).*)\r?\n\2.* -> $1 并多次点击 Replace All。 - Wiktor Stribiżew

具有相同第一个“单词”的行是否总是连续的？如果您想要相关的答案，请回答anubhava的问题。 - Casimir et Hippolyte

3个回答

2

使用Notepad++（假设具有相同首单词的行是连续的）：

搜索：^(\S++).*\K(?:\R\1(?:\h.*|$))+
替换为：无

演示

模式详细信息：

^             # start of the line
(\S++)        # the first "word" (all that isn't a whitespace) captured in group 1
.*            # all characters until the end of the line
\K            # remove characters matched before from the match result
(?:
    \R        # a newline
    \1        # reference to the capture group 1 (same first word)
    (?:
        \h.*  # a horizontal whitespace 
      |       # OR
        $     # the end of the line
    )
)+            # repeat one or more times

- Casimir et Hippolyte

确认，它适用于我的文件。在UltraEdit中也可以使用，因为Notepad++无法处理非常大的文件。 - Jim8645

1

@Jim8645：请注意，如果您使用Unix/Linux操作系统，基于awk的方法对于大文件非常有趣，因为它不需要将整个文件加载到内存中。 - Casimir et Hippolyte

0

在 Perl 中：

s/^((\w+).*)\n(?:(?:\2.*\n)*)/$1/gm

你可以试试这个：

#!/bin/usr/perl

use warnings;
use strict;

my $file = "john jane
john 123
john jim jane
jane john
jane 123
jane 456
jim
jim 1
";

$file =~ s/^((\w+).*)\n(?:(?:\2.*\n)*)/$1\n/gm;

print $file;

- José Castro

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sundeep · Accepted Answer

如果你有 awk

awk '!seen[$1]++' infile.txt

源自这个帖子：Unix：不排序删除重复行

该帖子讨论了如何在不排序的情况下删除重复行。