使用sed或awk重复提取两个字符串之间的文本？

Question

使用sed或awk重复提取两个字符串之间的文本？

7

我有一个名为“plainlinks”的文件，内容如下：

13080. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94092-2012.gz
13081. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94094-2012.gz
13082. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94096-2012.gz
13083. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94097-2012.gz
13084. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94098-2012.gz
13085. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94644-2012.gz
13086. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94645-2012.gz
13087. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94995-2012.gz
13088. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94996-2012.gz
13089. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-96404-2012.gz

我需要生成以下格式的输出：

999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404

- Mike Furlender

5个回答

7

仅供娱乐。

awk -F\/ '{print substr($7,0,12)}' plainlinks

或者使用grep

grep -Eo '[0-9]{6}-[0-9]{5}' plainlinks

- matchew

2

+1 简单的 grep 解决方案。 - Chris Seymour

@sudo_o，非常感谢你的解决方案，因为你是第一个回答的，所以给你点赞。 - matchew

同意，对于优雅的grep解决方案给予+1。 - sampson-chen

@sampson-chen，好的，你也+1。 - matchew

4

假设格式保持一致，您可以使用 awk 来完成：

awk 'BEGIN{FS="[/-]"; OFS="-"} {print $7, $8}' plainlinks > output_file

输出：

999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404

解释：

awk逐行读取输入文件，并将每行分成“字段”。
'BEGIN{FS="[/-]"; OFS="-"}指定输入行使用的分隔符应为/或-，它还指定输出应该由-分隔。
{print $7, $8}'告诉awk打印每行的第7个和第8个字段，本例中为999999和9xxxx。
plainlinks是输入文件的名称。
> output_file将输出重定向到名为output_file的文件。

- sampson-chen

4

只需使用shell的参数展开功能：

while IFS= read -r line; do
    tmp=${line##*noaa/}
    echo ${tmp%-????.gz}
done < plainlinks

- glenn jackman

1

如果格式保持不变，就不需要使用sed或awk：

cat your_file | cut -d "/" -f 7- | cut -d "-" -f 1,2

- jfg956

如果格式不保持不变，sed和awk的解决方案将会像现在这样崩溃。 :) - Kaz

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Chris Seymour · Accepted Answer

使用 sed 命令：

sed -E 's/.*\/(.*)-.*/\1/' plainlinks

输出：

999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404

使用-i选项保存对文件的更改：

sed -Ei 's/.*\/(.*)-.*/\1/' plainlinks

或者保存到新文件并进行重定向：

sed -E 's/.*\/(.*)-.*/\1/' plainlinks > newfile.txt

说明：

s/    # subsitution
.*    # match anything
\/    # upto the last forward-slash (escaped to not confused a sed)
(.*)  # anything after the last forward-slash (captured in brackets)
-     # upto a hypen
.*    # anything else left on line
/     # end match; start replace 
\1    # the value captured in the first (only) set of brackets
/     # end