如何按行长度排序，然后按字母表顺序反向排序

Question

如何按行长度排序，然后按字母表顺序反向排序

4

我有一个包含600个搜索和替换项的列表，需要将其作为sed脚本运行到一些文件中。问题是这些搜索项并不是正交的...但我认为可以通过按行长度排序来解决（即先找到最长的匹配项，然后在每个长度内按字母顺序排列）。因此，给出一个未排序的列表：

aaba
aa
ab
abba
bab
aba

我想要的是一个排序后的集合，例如：

abba
aaba
bab
aba
ab
aa

有没有一种方法，比如在行长度前添加并按字段排序来完成它？

额外加分的话 :-) !!! 搜索和替换实际上只是将 term 替换为 _term_ 我要使用的sed代码是 s/term/_term_/g 如何编写正则表达式以避免替换已经在 _ 对中的术语？

- Dycey

6个回答

2

$ awk '{print length($1),$1}' file |sort -rn
4 abba
4 aaba
3 bab
3 aba
2 ab
2 aa

我让你自己尝试去除第一列

- ghostdog74

2

你可以将所有内容压缩成一个正则表达式：

$ sed -e 's/\(aaba\|aa\|abba\)/_\1_/g'
testing words aa, aaba, abba.
testing words _aa_, _aaba_, _abba_.

如果我正确理解了你的问题，这将解决你所有的问题：没有“双替换”，并且始终匹配最长的单词。

- Johannes Hoff

你不应该按长度对项目进行排序吗？或者会发生某种贪婪匹配，总是匹配最长的字符串吗？ - mob

另外，对于600个项目来说，这是一行非常长的代码；-）但也许我可以将其拆分成更多行... - Dycey

2

不需要这样做：正则表达式总是会找到最长的匹配。 - Johannes Hoff

@Dycey：是的，那会很长。在这种情况下，您可以将脚本放入文件中，然后执行 sed -f regexpfile。 - Johannes Hoff

1

只需将您的流通过此类脚本：

#!/usr/bin/python
import sys

all={}
for line in sys.stdin:
    line=line.rstrip()
    if len(line) in all:
        all[len(line)].append(line)
    else:
        all[len(line)]=[line]

for l in reversed(sorted(all)):
    print "\n".join(reversed(sorted(all[l])))

对于加分题：同样地，用Python来完成（除非真的有不得不使用其他语言的理由，但我会很好奇想知道是什么原因）

- Gyom

这是在Python中进行排序的最短或最清晰的方式吗？ - Brad Gilbert

也许不是；这是我的第一反应。 - Gyom

就我个人而言，这个问题足够简单粗暴，我宁愿使用 Perl 的一行命令，也不想写整个 Python 脚本。虽然如果你坚持要用 Python，那么最好的方法可能是将文件读入内存，排序后再输出。这样做会更干净（但效率可能会降低）。 - Chris Lutz

0

这将按行长度对文件进行排序，最长的行排在前面：

cat file.txt | (while read LINE; do echo -e "${#LINE}\t$LINE"; done) | sort -rn | cut -f 2-

这将用_term_替换term，但不会将_term_变成__term__：

sed -r 's/(^|[^_])term([^_]|$)/\1_term_\2/g'
sed -r -e 's/(^|[^_])term/\1_term_/g' -e 's/term([^_]|$)/_term_\1/g'

第一个方法效果还不错，但是会忽略掉_term和term_，错误地将它们保留下来。如果这很重要，请使用第二个方法。以下是我愚蠢的测试案例：

# echo here is _term_ and then a term you terminator haha _terminator and then _term_inator term_inator | sed -re 's/(^|[^_])term([^_]|$)/\1_term_\2/g'
here is _term_ and then a _term_ you _term_inator haha _terminator and then _term_inator term_inator
# echo here is _term_ and then a term you terminator haha _terminator and then _term_inator term_inator | sed -r -e 's/(^|[^_])term/\1_term_/g' -e 's/term([^_]|$)/_term_\1/g'
here is _term_ and then a _term_ you _term_inator haha __term_inator and then _term_inator _term__inator

- John Kugelman

0

这个先按长度排序，然后再按字母逆序排序

for mask in `tr -c "\n" "." < $FILE | sort -ur`
do
    grep "^$mask$" $FILE | sort -r
done

tr 的用法是将 $FILE 中的每个字符替换为句点 - 这与 grep 中的任何单个字符匹配。

- martin clayton

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mob · Accepted Answer

10

你可以用一行Perl脚本来实现这个功能：

perl -e 'print sort { length $b<=>length $a || $b cmp $a } <>' input

- mob

应该将 $a cmp $b 改为 $b cmp $a，因为他想要它以相反的顺序。 - Brad Gilbert

2

任何你可能会使用大量 shell 脚本编写的任务，都可以在 Perl 中更轻松、更短、更清晰地完成。 - Chris Lutz

我认为这比Python的解决方案更清晰。https://dev59.com/7UrSa4cB1Zd3GeqPTh7X#1670454 - Brad Gilbert

我可能会这样写：perl -E'say for sort { length $b<=>length $a } grep chomp, <>' input。 - Brad Gilbert