从文件中删除包含重复字符串的行

Question

从文件中删除包含重复字符串的行

3

我需要删除文件中包含某个字符串多次的所有行，例如，如果我的文件如下：

This is a test toRemove first line

This is a test toRemove second line toRemove

应该生成一个只包含第一行的文件。

This is a test toRemove first line

我正在尝试在Linux命令行上执行此操作，并尝试使用grep或sed，如下所示：

grep -d "toRemove.*toRemove" myFile > myOtherFile

sed '/\toRemove.*toRemove/!d' myFile > myOtherFile

但是似乎什么都不起作用。有谁知道如何获得这个？

- Tiz

应该是 sed '/toRemove.*toRemove/d' myFile > myOtherFile。\t 匹配制表符，而 !d 则删除不匹配模式的行。 - Wiktor Stribiżew

2

grep -v "toRemove.*toRemove" myFile > myOtherFile 应该可以正常工作。 - anubhava

2个回答

0

这可能适用于你（GNU sed）：

sed '/\<\(toRemove\)\>.*\<\1\>/d' file

这将删除包含两个或更多个单词toRemove出现的行。

要删除包含单词toRemove的任何行，超过第一行：

sed '/\<toRemove\>/{x;/./{x;d};x;h}' file

- potong

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Wiktor Stribiżew · Accepted Answer

您可以使用

sed '/toRemove.*toRemove/d' myFile > myOtherFile
grep -v "toRemove.*toRemove" myFile > myOtherFile

sed：请注意，\t匹配制表符，！d会删除不匹配模式的行。因此，您需要在t前面删除\并在d前面删除!。

grep：您应该使用-v选项来反转正则检查的结果（它将输出所有不匹配模式的行）。

请参见在线演示：

s='This is a test toRemove first line
This is a test toRemove second line toRemove'
sed '/toRemove.*toRemove/d' <<< "$s"
# => This is a test toRemove first line
grep -v 'toRemove.*toRemove' <<< "$s"
# => This is a test toRemove first line