使用Awk或Cut打印列？

Question

使用Awk或Cut打印列？

3

我正在编写一个脚本，它将以文件名作为参数，查找每行开头的特定单词 - 在这种情况下是单词ATOM，并打印特定列的值。

$FILE=*.pdb *

if test $# -lt 1
then
 echo "usage: $0 Enter a .PDB filename"
 exit
fi
if test -r $FILE
then
 grep ^ATOM $FILE | awk '{ print $18 }' | awk '{ print NR $4, "\t" $38,}'
else
 echo "usage: $FILE must be readable"
 exit
fi

我有三个问题需要解决：

如何使用awk仅打印第一个单词为ATOM的行
如何使用awk仅打印符合上述条件的行中的特定列，具体是2-20和38-40列
如何指示这必须是pdb文件？*.pdb *

- Koala

3个回答

1

与答案相反，您的任务可以仅使用一个awk命令完成。无需grep或cut或...

if [ $# -lt 1 ];then
 echo "usage: $0 Enter a .PDB filename"
 exit
fi
FILE="$1"
case "$FILE" in
*.pdb )

if test -r $FILE
then 
 # do for 2-20 assuming whites paces as column separators
 awk '$1=="ATOM" && NF>18 { 
   printf "%s ",$2
   for(i=3;i<=19;i++){
     printf "%s ",$i
   }
   printf "%s",$20   
 }' "$FILE"
else
 echo "usage: $FILE must be readable"
 exit
fi
;;
*) exit;;
esac

- ghostdog74

0

您可以在本地的bash中完成所有需要的操作，而无需生成任何子进程：

#!/bin/bash

declare    key="ATOM"
declare    print_columns=( 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 38 39 40 )

[ ! -f "${1}" ] && echo "File not found." && exit
[ "${1%.pdb}" == "${1}" ] && echo "File is wrong type." && exit

while read -a columns; do
  if [ ${columns[0]} == ${key} ]; then
    printf "%s " ${key}
    for print_column in ${print_columns[@]}; do
      printf "%s " ${columns[${print_column}]}
    fi
    printf "\n"
  fi
done < ${1}

- Andrew Vickers

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- David Z · Accepted Answer

4

That would be
```
awk '$1 == "ATOM"' $FILE
```
That task is probably better accomplished with cut:
```
grep ^ATOM $FILE | cut -c 2-20,38-40
```
If you want to ensure that the filename passed as the first argument to your script ends with .pdb: first, please don't (file extensions don't really matter in UNIX), and secondly, if you must, here's one way:
```
"${1%%.pdb}" == "$1" && echo "usage:..." && exit 1
```
This takes the first command-line argument ($1), strips the suffix .pdb if it exists, and then compares it to the original command-line argument. If they match, it didn't have the suffix, so the program prints a usage message and exits with status code 1.

- David Z

谢谢David！我可以问一下为什么你说“请不要”限制参数只能是.pdb文件吗？如果我需要打印的列仅为具有18-30列条目的类型，我应该分别管道处理吗？grep ^ATOM $1 | cut -f 18-30 | cut -f 2-20, 38-40 - Koala

@Koala：关于文件名的事情，如果要在以“.txt”结尾的文件上使用程序怎么办？或者是“.csv”？或者是“.bak”？还有一种没有扩展名的文件？仅仅因为文件名不符合某些任意约定就让程序失败，这似乎有点傻。当然，这是你的程序，所以你可以让它检查文件名，但如果我的经验有任何指导意义，那么总会有一天你会想要摆脱这个检查。其他UNIX实用程序（例如grep和awk）不检查文件名；这是有原因的。 - David Z

关于你问题的第二部分，关于列，我不太明白你在问什么。 - David Z

问题第二部分的澄清：如果列18-30中有内容，则输出将显示列2-20、38-40的内容。我该如何过滤？使用管道还是if then语句？不确定如何设置。 - Koala

嗯，那会比较复杂。我想不出仅使用“cut”的方法来完成它，但您可以尝试使用类似于“awk '$1 == "ATOM" && substr($0, 18, 13) !~ /[^[:space:]]/ | cut -c 2-20,38-40'”这样的命令（或者您可以直接在“awk”中完成整个操作，但程序会稍微长一些）。当然，这取决于您如何定义“内容”（在此示例中，我假设它表示任何非空格的字符串）。 - David Z