如何从awk中选择两列并打印出不匹配的内容

Question

如何从awk中选择两列并打印出不匹配的内容

3

我需要从OMO帐户迁移日志中选择两个MSISDN值，并打印不匹配的值。

[2019-03-11 04:15:08 INFO-SUBAPP ESBRestClient:117] ## IP-103.228.158.85##TOKEN-201903110416276787774(**923419606907**)RESPONSE-BODY: {"callStatus":"false","responseCode":"18","description":"OMO account migration – **923481057772**"}
[2019-03-11 04:24:02 INFO-SUBAPP ESBRestClient:117] ## IP-119.153.134.128##TOKEN-1552260212780839(923214748517)RESPONSE-BODY: {"callStatus":"false","responseCode":"18","description":"OMO account migration – 953214748517"}

923481057772 是旧的 MSISDN。

923419606907 是新的 MSISDN，我需要将其保存在一个新文件中。我一直在使用以下命令来只选择新的 MSISDN：

cat migration.txt | egrep "OMO account migration" | egrep "responseCode\":\"1700" | awk -F"(" '{gsub(/\).*/,"",$2);print $2}' >>newmsisdn.txt

我正在使用保存的msisdn值来获取令牌号码。然后，我使用这些令牌来获取多个参数。最终输出如下：

Date            Time          Old MSISDN        New MSISDN     Old Profile New Profile  CNIC      Acc Status Acc Status Migration Channel
                                                                                                   (Before)   (After)
2019-03-11  |  00:00:14  |  923135260528  |  923029403541  |  OMO BVS MA  |  0  |  1620221953175  |  ACTIVE  |     |  subapp

2019-03-11  |  00:00:14  |  923135260528  |  923003026654  |  OMO BVS MA  |  0  |  1620221953175  |  ACTIVE  |     |  subapp

2019-03-11  |  00:00:14  |  923135260528  |  923003026654  |  OMO BVS MA  |  0  |  1620221953175  |  ACTIVE  |     |  subapp

2019-03-11  |  00:00:14  |  923135260528  |  923038048244  |  OMO BVS MA  |  0  |  1620221953175  |  ACTIVE  |     |  subapp

在第二个日志实例中，这两个值是相同的。我需要过滤掉它们，即我只需要使用不匹配的值。如何比较这两个不匹配的值并打印新的MSISDN？

- SANA SIDDIQUI

1

当你说“filter those out”时，是指“打印这些行”还是“打印除了这些行之外的所有行”，或者其他一些意思？[编辑]将您的问题更改为显示多行输入，而不仅仅是一个，其中一些符合您的标准，一些不符合您的标准，并且给出该输入的预期输出。 - Ed Morton

我需要打印除了相同MSISDN的所有行。实际上，对于新的MSISDN，我需要获取令牌号码。然后，我将使用该令牌号码提取多个参数。我也会发布输出结果。 - SANA SIDDIQUI

1

Sana，如果你想得到最好和最有帮助的答案，你应该遵循@EdMorton的建议，例如展示“多行输入，而不仅仅是一行，其中一些符合你的标准，一些不符合。”只有你知道你的实际输入是什么样子的。如果没有清晰而精确的数据，我们只能猜测，而猜测可能会浪费我们和你的时间。 - John1024

3个回答

0

考虑到您的实际输入文件与所示样本相同，并且您需要每行的新值，如果是这种情况，请尝试以下操作。

awk '
/OMO account migration/ && /responseCode":"18"/{
  val_old=val_new=""
  match($0,/\*\*[0-9]+\*\*/)
  val_old=substr($0,RSTART,RLENGTH)
  $0=substr($0,RSTART+RLENGTH)
  match($0,/\*\*[0-9]+\*\*/)
  val_new=substr($0,RSTART,RLENGTH)
}
(val_old!=val_new){
  gsub("*","",val_new)
  print val_new
}
'   Input_file

说明：现在为上述代码添加详细的解释。

awk '                                                     ##Starting awk program here.
/OMO account migration/ && /responseCode":"18"/{          ##Checking condition if a line contains strings OMO account migration AND responseCode":"18" in it then do following.
  val_old=val_new=""                                      ##Nullifying variables val_old and val_new here.
  match($0,/\*\*[0-9]+\*\*/)                              ##Using match OOTB function of awk to match from **digits** here. If match found then value of RSTART and RLENGTH(awk variables) will be SET.
  val_old=substr($0,RSTART,RLENGTH)                       ##Creating variable val_old which is substring of starting point as RSTART and ending point of RLENGTH here.
  $0=substr($0,RSTART+RLENGTH)                            ##Re-defining value of current line with substring whose value starts after matched regexs next index, so that we can catch new value in next further statements.
  match($0,/\*\*[0-9]+\*\*/)                              ##Using match OOTB function of awk to match from **digits** here. If match found then value of RSTART and RLENGTH(awk variables) will be SET(2nd time run).
  val_new=substr($0,RSTART,RLENGTH)                       ##Creating variable named val_new whose value is substring of current line startpoint is RSTART and ending point is RLENGTH here.
}                                                         ##Closing BLOCK for string matching condition here.
(val_old!=val_new){                                       ##Checking condition ig val_old variable is NOT equal to val_new then do following.
  gsub("*","",val_new)                                    ##Globaly subsituting * in val_new to get exact value as per OP need.
  print val_new                                           ##Printing val_new value here.
}
'  Input_file                                             ##Mentioning Input_file name here.

- RavinderSingh13

@SANA SIDDIQUI，你能否请检查一下这个，并让我知道是否对你有所帮助？ - RavinderSingh13

0

我会选择以下方法：我发现每个MSISDN号码都包含12个数字([0-9])，位于两个双星号之间。
您可以使用以下正则表达式找到它们：

grep -o "\*\*[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\*\*"

如果您的系统支持，您可以将其简化为：

grep -o "\*\*[0-9]{12}\*\*"

一旦您拥有这些，您可以使用 awk 来显示不同的部分，类似于：

'{IF ($1 != $2) PRINT $1 $2}' (not tested).

- Dominique

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- John1024 · Accepted Answer

第一个问题的答案

尝试：

awk -F'[*][*]' '/OMO account migration/ && /responseCode":"18"/ && $2 != $4 { print $2}' migration.txt

这种方法避免了需要生成多个进程并用管道连接它们的需求。因此，这种方法效率相对较高。

工作原理如下：

- -F'[*][*]'：将字段分隔符设置为两个星号。这样，新的MSISDN是第二个字段，旧的MSISDN是第四个字段。 - /OMO account migration/ && /responseCode":"18"/ && $2 != $4 { print $4}：选择包含正则表达式OMO account migration/和正则表达式responseCode":"18"以及第二个字段与第四个字段不同的行。对于任何这样的行，都会打印第二个字段。

示例：

假设有以下三行测试文件：

$ cat migration.txt 
[2019-03-11 04:15:08 INFO-SUBAPP ESBRestClient:117] ## IP-103.228.158.85##TOKEN-201903110416276787774(**923419606907**)RESPONSE-BODY: {"callStatus":"false","responseCode":"18","description":"OMO account migration – **923481057772**"}
[2019-03-11 04:15:08 INFO-SUBAPP ESBRestClient:117] ## IP-103.228.158.85##TOKEN-201903110416276787774(**923419606888**)RESPONSE-BODY: {"callStatus":"false","responseCode":"19","description":"OMO account migration – **923481057999**"}
[2019-03-11 04:15:08 INFO-SUBAPP ESBRestClient:117] ## IP-103.228.158.85##TOKEN-201903110416276787774(**923419606123**)RESPONSE-BODY: {"callStatus":"false","responseCode":"18","description":"OMO account migration – **923419606123**"}

让我们运行命令：

$ awk -F'[*][*]' '/OMO account migration/ && /responseCode":"18"/ && $2 != $4 {print $2}' migration.txt >>newmsisdn.txt

输出文件现在包含我们想要的一个新MSISDN号码。

$ cat newmsisdn.txt 
923419606907