如何从文件中提取单个字节块？

Question

如何从文件中提取单个字节块？

103

我想在Linux桌面（RHEL4）中从一个大于1 GB的文件中提取一定范围的字节（通常少于1000个），我知道在文件中的偏移量和块的大小。

我可以编写代码来实现这个功能，但是否有命令行解决方案呢？

理想情况下，类似这样的命令：

magicprogram --offset 102567 --size 253 < input.binary > output.binary

- DanM

7个回答

84

这是一个老问题，但我想增加另一种版本的 dd 命令，它更适合处理大块字节：

dd if=input.binary of=output.binary skip=$offset count=$bytes iflag=skip_bytes,count_bytes

$offset和$bytes是以字节为单位的数字。

与Thomas的答案不同之处在于这里没有出现bs=1。bs=1将输入和输出块大小设置为1个字节，当要提取的字节数很大时，这会使速度变得非常慢。

这意味着我们将块大小(bs)保留为默认值512个字节。使用iflag=skip_bytes，count_bytes，我们告诉dd将skip和count后面的值视为字节数量，而不是块数量。

- ChronoTrigger

6

这确实比我的答案快得多。 - Thomas Padron-McCarthy

1

在Mac上无法运行 - iflag 是一个未知的操作数，如果没有它，你将得到整个代码块。 - Timmmm

3

GNU dd 可以用于支持 iflag（安装 coreutils 后使用 brew install）。注意：默认情况下，这些实用程序会带有“g”前缀（例如 gdd 而不是 dd）。 - Shakil

16

head -c + tail -c

不确定它在效率上如何比较dd，但它很有趣：

printf "123456789" | tail -c+2 | head -c3

选择从第二个字节开始的3个字节：

另请参阅：

- Ciro Santilli OurBigBook.com

@elvis.dukaj 是的，应该没有区别。只需尝试使用 printf '\x01\x02' > f 和 hd 即可。 - Ciro Santilli OurBigBook.com

3

比使用 bs=1 的 dd 命令快得多，谢谢！请注意，tail 命令从第 1 个字节开始计数，而不是从 0 开始。此外，当 tail 命令的输出被 head 命令提前关闭时，它会以错误代码 1 退出。在使用 "set -e" 命令时，请确保忽略该错误。 - proski

2

我曾经遇到同样的问题，试图剪切一个RAW磁盘映像的部分。使用bs=1的dd是无法使用的，因此我编写了一个简单的C程序来完成这个任务。

// usage:
//  ./cutfile srcfile destfile offset length
//  ./cutfile my.image movie.avi 4524 20412452
// compile, presuming it is saved as cutfile.cc:
//  gcc cutfile.cc -o cutfile -std=c11 -pedantic -W -Wall -Werror 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char *argv[])
{
  if(argc != 5) {
      printf("error, need 4 arguments!\n");
      return 1;
  }


  const unsigned blocksize = 16*512;  // can adjust
  unsigned char buffer[blocksize];

  FILE *f = fopen(argv[1], "rb");
  FILE *fout = fopen(argv[2], "wb");
  long offset = atol(argv[3]);
  long length = atol(argv[4]);
  if(f==NULL || fout==NULL) {
      perror("cannot open file");
      return 1;
  }
  fseek(f, offset, SEEK_SET);

  while(length > blocksize) {
      fread(buffer, 1, blocksize, f);
      fwrite(buffer, 1, blocksize, fout);
      length -= blocksize;
  }
  if(length>0) { // copy rest
      fread(buffer, 1, length, f);
      fwrite(buffer, 1, length, fout);
  }    

  fclose(fout);
  fclose(f);
  return 0;
}

- karna7

请注意，在C++中使用std::ifstream和std::ofstream是很正常的。 - Alexis Wilke

是的，但实际上如果我像把 cstdio 更改为 stdio.h 这样的包含文件，它就是纯 C 了。我不知道为什么我选择从 C++ 标头开始，也许我一开始认为我会使用更多的 C++ 材料？！ - karna7

1

是的，你可能想要编辑你的答案并将其改为C语言，因为OP问的是关于C语言的。 - Alexis Wilke

2

最初的回答

更快了

dd bs=<req len> count=1 skip=<req offset> if=input.binary of=output.binary

- Albert Burbea

3

问题在于，“skip”是以“bs”为单位的。 - Arkku

这是执行者的细节，比上面更好，确实需要重新计算，例如：req_offset=$(bc <<< "$offset/$bs") 并确保它是一个整数。 - Tchakabam

1

dd 命令可以完成所有这些操作。请查看调用的 seek 和/或 skip 参数。

- Joe

但是当你想要块错位访问时，dd可能会非常慢。而且使用bs=1会非常缓慢。 - karna7

0

我经常遇到这种情况，所以我创建了一对bash函数来处理这个问题，它们被称为“crunch”和“munch”。 “crunch”接受起始和结束偏移量，提取特定范围的内容，而“munch”接受一个偏移量，并从文件的头部或尾部提取字节。它们在内部使用“head”和“tail”，并允许您以十六进制或十进制格式指定偏移量。

使用示例：

- `crunch input.binary 102567 $((102567+253))` - `crunch myfile.bin 0x1000 0x2000` - `munch head myfile.bin 0x100` - `munch tail myfile.bin 224892`

crunch() {
  if [ "$#" -ne 3 ]; then
    echo "usage: crunch <file.bin> <start address> <end address>"
    echo ""
    echo "example: crunch myfile.bin 0x1000 0x2000"
  else
    local file="$1"
    local start_addr="$2"
    local end_addr="$3"
    local difference="$(($end_addr - $start_addr))"
    local out_name="${file%%.*}_crunch.${file#*.}"
    echorun_conf "tail -c +$(($start_addr+1)) $file | head -c $difference > $out_name"
  fi
}

munch() {
  if [ "$#" -ne 3 ]; then
    echo "usage: munch <(head|tail)> <file.bin> <offset>"
    echo ""
    echo "example: munch head myfile.bin 0x100"
  else
    local op="$1"
    local file="$2"
    local offset="$3"
    local out_name="${file%%.*}_munch.${file#*.}"
    case "$op" in
      "head") echorun_conf "head -c $((offset)) $file > $out_name" ;;
      "tail") echorun_conf "tail -c +$((offset+1)) $file > $out_name" ;;
      *)
      echo "usage: munch <(head|tail)> <file.bin> <offset>"
      echo ""
      echo "example: munch head myfile.bin 0x100" ;;
    esac
  fi
}

# echo and run command with y/n confirmation
echorun_conf() {
  echo "$@"
  echo -n "Execute [y/n]? "
  read reply
  echo
  if [ "$reply" != "${reply#[Yy]}" ]; then
    eval "$@"
  fi
}

- nandkeypull

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Thomas Padron-McCarthy · Accepted Answer

152

试试dd命令:

dd skip=102567 count=253 if=input.binary of=output.binary bs=1

选项bs=1设置块大小，使得dd一次只读/写一个字节。默认块大小为512个字节。

bs的值还会影响skip和count的行为，因为在skip和count中的数字是dd将要跳过和读/写的块数。

- Thomas Padron-McCarthy

3

可以选择添加 status=none 以抑制输出到标准错误流。 - kenorb

24

这里是使用十六进制偏移量的示例：dd if=in.bin bs=1 status=none skip=$((0x88)) count=$((0x80)) of=out.bin。 - kenorb

2

你使用bs=1和count=253而不是相反的方式，有特定的原因吗？更大的块大小会使命令更有效吗？ - rexford

1

@rexford：跳过的块数也是给定的，而且不是253的倍数。鉴于操作系统在从文件系统上的普通文件读取时会进行自己的缓冲，因此在这种情况下效率不会像从设备读取时那样低。 - Thomas Padron-McCarthy

2

bs=1比起bs=1000来说慢得多。在一个短暂的测试中，我实际上看到了500倍的差距。 - Stefan Reich

显示剩余3条评论