如何比较两个文本文件中的部分行？

Question

如何比较两个文本文件中的部分行？

3

我收到了两个txt文件，它们各自包含按制表符分隔的多列信息。我想要做的是在这两个文件中查找具有匹配列的行。--不是整行，而只是它们的第一列部分应该相同。如何在bash脚本中实现？

我尝试使用grep -Fwf。

所以这就是文件的样子。

aaaa   bbbb
cccc   dddd

并且

aaaa   eeee
ffff   gggg

我希望得到的输出内容是这样的：

bbbb and eeee match

我还没有找到一个同时进行按行和按单词比较的命令。很抱歉没有提供我的代码，我是新手，目前还没有想出合理的方法。谢谢！

- dukeduck

4个回答

1

有不同种类和不同的工具可以进行比较：

diff
cmp
comm
...

所有命令都有选项来改变比较方式。

对于每个命令，您可以指定过滤器。例如：

# remove comments before comparison
diff <( grep -v ^# file1) <( grep -v ^# file2)

没有具体的例子，就不可能更加准确。

- Wiimm

如果无法提供一个确切的答案，你应该选择不回答。 - hek2mgl

1

为什么不引导到正确的方式并告诉选项呢？现在他能够搜索命令并自己尝试——这是学习的最佳方式！ - Wiimm

只有当你知道问题时，才能正确指导别人。否则你只是把他们引向某个地方。 - hek2mgl

1

假设您的以制表符分隔的文件维护了正确的文件结构，则应该可以使用以下方法：

diff <(awk '{print $2}' f1) <(awk '{print $2}' f2) 
# File names: f1, f2
# Column: 2nd column.

当有不同的情况时，输出如下：

2c2
< dx
---
> ldx

当列相同时没有输出。

我尝试了@Wiimm的答案，但对我无效。

- Rocky Li

comm -12 ... 只打印相同的行。 - Wiimm

-1

你可以使用 awk，像这样：

awk 'NR==FNR{a[NR]=$1;b[NR]=$2;next}
     a[FNR]==$1{printf "%s and %s match\n", b[FNR], $2}' file1 file2

输出：

bbbb and eeee match

解释（相同代码分成多行）：

# As long as we are reading file1, the overall record
# number NR is the same as the record number in the
# current input file FNR
NR==FNR{
    # Store column 1 and 2 in arrays called a and b
    # indexed by the record number
    a[NR]=$1
    b[NR]=$2
    next # Do not process more actions for file1
}

# The following code gets only executed when we read
# file2 because of the above _next_ statement

# Check if column 1 in file1 is the same as in file2
# for this line
a[FNR]==$1{
    printf "%s and %s match\n", b[FNR], $2
}

- hek2mgl

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Oliver Gaida · Accepted Answer

你看过join命令吗？结合sort也许是你需要的。这个链接https://shapeshed.com/unix-join/可以帮助你更好地理解。

例如：

$ cat a
aaaa   bbbb
cccc   dddd
$ cat b
aaaa   eeee
ffff   gggg
$ join a b
aaaa bbbb eeee

如果第一列的值没有排序，则必须先对其进行排序，否则join将无法工作。 join <(sort a) <(sort b) 祝好，

Oliver