使用Bash按模式对文件的行进行排序

Question

使用Bash按模式对文件的行进行排序

4

我有一个包含以下行的文件：

This test took 1201ms to execute
The IO operation cost 113ms
Main thread have been executing for 16347ms

我该如何按照ms旁边的数字对它们进行排序？

我尝试使用以下sed命令，但未成功。

sed -r 's/[[:digit]]\+ms//g' file.txt | sort -r | > tmp

- X625

没有一致性。你有什么计划？ - heemayl

@heemayl 我需要按照毫秒值对行进行排序。 - X625

3

sed命令（应该是s/[[:digit:]]\+ms//g）实际上删除了行中的持续时间值，因此它与您想要的相反。 - Kevin Hoerr

1

Google Schwartzian变换。 - tripleee

有关 Perl 解决方案，请参见 https://stackoverflow.com/questions/5753436/sort-strings-based-on-a-character-contained-in-the-string 和 https://perldoc.perl.org/functions/sort.html。 - Sundeep

4个回答

2

GNU awk:

awk 'BEGIN {PROCINFO["sorted_in"]="@ind_num_asc"} \
        {idx=gensub(".*\\s+([0-9]+).*", "\\1", "g"); arr[idx]=$0} \
          END{for (i in arr) print arr[i]}' file.txt

PROCINFO["sorted_in"]="@ind_num_desc" 变量可以根据数字索引设置（关联）数组的排序顺序
{idx=gensub(".*\\s+([0-9]+).*", "\\1", "g"); arr[idx]=$0} 获取数字并将其作为关联数组arr的索引，对应记录是值
END{for (i in arr) print arr[i]} 打印数组的值

如果要将排序顺序反转为降序，请执行以下操作：

PROCINFO["sorted_in"]="@ind_num_desc"

例子：

% cat file.txt
This test took 1201ms to execute
The IO operation cost 113ms
Main thread have been executing for 16347ms

% awk 'BEGIN {PROCINFO["sorted_in"]="@ind_num_asc"} {idx=gensub(".*\\s+([0-9]+).*", "\\1", "g"); arr[idx]=$0} END{for (i in arr) print arr[i]}' file.txt
The IO operation cost 113ms
This test took 1201ms to execute
Main thread have been executing for 16347ms

- heemayl

1

使用GNU awk（gawk）：

$ awk 'BEGIN{PROCINFO["sorted_in"]="@val_num_asc"} {for (i=1;i<=NF;i++) if ($i~/ms$/){a[$0]=$i+0; break}} END{for (line in a)print line}' file.txt
The IO operation cost 113ms
This test took 1201ms to execute
Main thread have been executing for 16347ms

工作原理

BEGIN{PROCINFO["sorted_in"]="@val_num_asc"}

这告诉awk按数组值升序排序。这是GNU的一个特性。
for (i=1;i<=NF;i++) if ($i~/ms$/){a[$0]=$i+0; break}

对于每个行中的字段，我们检查它是否以ms结尾。如果是，我们将该字段的值分配给关联数组a，并将其键设置为整行。
END{for (line in a)print line}

在我们读取整个文件后，我们打印出数组a的键。由于数组a按值升序排序，因此这个打印输出将按时间升序进行。

- John1024

1

您可以使用sed提取数字部分并将其与定界符一起放置在行的开头，然后按第一个字段进行sort排序，最后使用cut删除添加的字段：

sed -E 's/^(.*) ([[:digit:]]+)ms(.*)$/\2|\1 \2ms\3/' file | # extract ms and place it at the beginning
  sort -t '|' -k1,1n |                                      # sort it by the field added above
  cut -f2- -d '|'                                           # remove the field

输出：

The IO operation cost 113ms
This test took 1201ms  to execute
Main thread have been executing for 16347ms

- codeforester

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- iamauser · Accepted Answer

$ awk '{match($0,/[[:digit:]]+ms/,a)}{print substr(a[0], 1, length(a[0])-2),$0}' inputFile | sort -nk1 | cut -f2- -d ' '
The IO operation cost 113ms
This test took 1201ms to execute
Main thread have been executing for 16347ms

awk 匹配 [[:digit:]]ms 并将其（除了最后两个字符ms）打印到行的开头，然后使用第一个字段进行sort排序。之后，cut删除第一个字段并获取原始行。