Linux bash - 将文件拆分为两个单词术语

Question

3

我已经编写了一个命令，可以将文件中的所有单词打印在不同的行上： sed -e 's/[^a-zA-Z]/\n/g' test_input | grep -v "^$"

如果test_input包含"My bike is fast and clean"，那么这个命令的输出将是： My bike is fast and clean

现在我需要另一种版本的命令，可以打印文本中所有的两个单词的组合，像这样（仍然使用Bash）： My bike bike is is fast fast and and clean

您知道如何实现吗？

- Daniele

这个命令行能实现吗？sed 's/([a-zA-Z]+[^a-zA-Z]+[a-zA-Z]+)[^a-zA-Z]+/$1\n/g'我手头没有 Linux... - chiccodoro

@chiccodoro：将您的命令更改为sed -r ...，并将$1更改为\1，它会每行打印两个单词，但不会重复这些单词。 - Dennis Williamson

5个回答

1

用 awk 就可以了，不需要其他的东西

$ echo "My bike is fast and clean" | awk '{for(i=1;i<NF;i++){printf "%s %s\n",$i,$(i+1) } }'
My bike
bike is
is fast
fast and
and clean

- ghostdog74

1

这个也可以：

paste  <(head -n -1 test.dat) <(tail +2 test.dat)

- Fritz G. Mehner

0

这可能需要使用GNU sed，而且可能有更简单的方法：

sed 's/[[:blank:]]*\<\(\w\+\)\>/\1 \1\n/g; s/[^ ]* \([^\n]*\)\n\([^ ]*\)/\1 \2\n/g; s/ \n//; s/\n[^ ]\+$//' inputfile

- Dennis Williamson

0

在你的命令中添加：

| awk '(PREV!="") {printf "%s %s\n", PREV, $1} {PREV=$1}'

- user80168

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mob · Accepted Answer

将您的Word文件通过管道传输到此脚本的标准输入。

#! bash
last_word=""
while read word
do
  if [ $last_word != "" ] ; then
      echo $last_word $word
  fi
  last_word=$word
done