一个带有虚假哈夫曼表的JPEG文件能被恢复吗?

12

我有一张JPEG图片,在任何程序中都无法打开:

在Ubuntu图像查看器中打开会产生以下结果:

could not load image. bogus huffman table definition

将照片通过convert转换会得到类似的结果:

$ convert corrupt.jpg out.jpg
convert.im6: Bogus Huffman table definition `corrupt.jpg' @ error/jpeg.c/JPEGErrorHandler/316.
convert.im6: no images defined `out.jpg' @ error/convert.c/ConvertImageCommand/3044.

使用exiftool运行照片会得到:

ExifTool Version Number         : 9.46
File Name                       : corrupt.jpg
Directory                       : .
File Size                       : 47 kB
File Modification Date/Time     : 2015:04:11 01:31:14-07:00
File Access Date/Time           : 2018:05:04 10:26:04-07:00
File Inode Change Date/Time     : 2018:05:04 10:26:03-07:00
File Permissions                : r--------
File Type                       : JPEG
MIME Type                       : image/jpeg
Comment                         : Y�.�.�..2..Q.Q.
Image Width                     : 640
Image Height                    : 480
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:2 (2 1)
Image Size                      : 640x480

未被破坏的照片包含相似的图像内容,平均大小为45-48k,因此我认为照片数据本身就在这个JPEG文件中。我将照片托管在S3上,您可以使用wget下载它:wget https://s3.amazonaws.com/jordanarseno.com/corrupt.jpg。我用hexedit打开文件并发现以下信息:
- 前几百字节之外的照片内容是随机分布的,足以表明它包含一张图片,即我没有看到连续的0或F流。 - 它确实以FF D8文件标志开头,正如JPEG应该的那样。 - 接下来的两个字节不是FF E0或FF E1,就像文件签名列表所说的应该对应于JPEG或JFIF。相反,它是FF FE。但是,在表格中列出了这个值,但是被列为:
字节顺序标记,用于以小端16位Unicode传输格式编码的文本文件
  • FF FE之后不久,我看到的字节的ASCII表示为:&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz。对于JPEG来说似乎相当奇怪。这是什么?

  • 同样,大约100个字节之后出现了ASCII字符串&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz

  • FF D9(JPEG终止符字符串)确实在文件中,但是在此终止符之后确实出现了字符:

    FF D9 5C 72 78 E0 7C 94 CD B2 9C FF 00 C4 BF 53 C0 E7 FE 41 D3 9C FF 00 E3 95 7C F1 B6 92 5F 7A 2B EB 54 AF BF E6 30 FD A0 7F CC 3B 53 E9 FF 00 40 F9 FF 00 F8 8A 4D F7 08 30

切换到Windows并使用JPEGsnoop:

JPEGsnoop 1.8.0 by Calvin Hass
  http://www.impulseadventure.com/photo/
  -------------------------------------

  Filename: [C:\corrupt.jpg]
  Filesize: [47760] Bytes

Start Offset: 0x00000000
*** Marker: SOI (xFFD8) ***
  OFFSET: 0x00000000

*** Marker: COM (Comment) (xFFFE) ***
  OFFSET: 0x00000002
  Comment length = 36
    Comment=Y.Ò................à.....2..Q.Q...

*** Marker: DQT (xFFDB) ***
  Define a Quantization Table.
  OFFSET: 0x00000028
  Table length = 132
  ----
  Precision=8 bits
  Destination ID=0 (Luminance)
    DQT, Row #0:   3   2   2   3   4   7   9  10 
    DQT, Row #1:   2   2   2   3   4  10  10   9 
    DQT, Row #2:   2   2   3   4   7  10  12  10 
    DQT, Row #3:   2   3   4   5   9  15  14  11 
    DQT, Row #4:   3   4   6  10  12  19  18  13 
    DQT, Row #5:   4   6   9  11  14  18  19  16 
    DQT, Row #6:   8  11  13  15  18  21  21  17 
    DQT, Row #7:  12  16  16  17  19  17  18  17 
    Approx quality factor = 91.45 (scaling=17.09 variance=0.95)
  ----
  Precision=8 bits
  Destination ID=1 (Chrominance)
    DQT, Row #0:   3   3   4   8  17  17  17  17 
    DQT, Row #1:   3   4   4  11  17  17  17  17 
    DQT, Row #2:   4   4  10  17  17  17  17  17 
    DQT, Row #3:   8  11  17  17  17  17  17  17 
    DQT, Row #4:  17  17  17  17  17  17  17  17 
    DQT, Row #5:  17  17  17  17  17  17  17  17 
    DQT, Row #6:  17  17  17  17  17  17  17  17 
    DQT, Row #7:  17  17  17  17  17  17  17  17 
    Approx quality factor = 91.44 (scaling=17.11 variance=0.19)

*** Marker: COM (Comment) (xFFFE) ***
  OFFSET: 0x000000AE
  Comment length = 5
    Comment=...

*** Marker: SOF0 (Baseline DCT) (xFFC0) ***
  OFFSET: 0x000000B5
  Frame header length = 17
  Precision = 8
  Number of Lines = 480
  Samples per Line = 640
  Image Size = 640 x 480
  Raw Image Orientation = Landscape
  Number of Img components = 3
    Component[1]: ID=0x01, Samp Fac=0x21 (Subsamp 1 x 1), Quant Tbl Sel=0x00 (Lum: Y)
    Component[2]: ID=0x02, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cb)
    Component[3]: ID=0x03, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cr)

*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x000000C8
  Huffman table length = 418
  ----
  Destination ID = 0
  Class = 0 (DC / Lossless Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (001 total): 00 
    Codes of length 03 bits (005 total): 01 02 03 04 05 
    Codes of length 04 bits (001 total): 06 
    Codes of length 05 bits (001 total): 07 
    Codes of length 06 bits (001 total): 08 
    Codes of length 07 bits (001 total): 09 
    Codes of length 08 bits (001 total): 0A 
    Codes of length 09 bits (001 total): 0B 
    Codes of length 10 bits (000 total): 
    Codes of length 11 bits (000 total): 
    Codes of length 12 bits (000 total): 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (000 total): 
    Codes of length 16 bits (000 total): 
    Total number of codes: 012

  ----
  Destination ID = 1
  Class = 0 (DC / Lossless Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (003 total): 13 0E 0F 
    Codes of length 03 bits (001 total): 10 
    Codes of length 04 bits (001 total): 11 
    Codes of length 05 bits (001 total): 12 
    Codes of length 06 bits (001 total): 12 
    Codes of length 07 bits (012 total): 12 0B 0D 13 15 13 11 15 10 11 12 11 
    Codes of length 08 bits (016 total): 01 03 03 03 04 04 04 08 04 04 08 11 0B 0A 0B 11 

    Codes of length 09 bits (013 total): 11 11 11 11 11 11 11 11 11 11 11 11 11 
    Codes of length 10 bits (011 total): 11 11 11 11 11 11 11 11 11 11 11 
    Codes of length 11 bits (012 total): 11 11 11 11 11 11 11 11 11 11 11 01 
    Codes of length 12 bits (015 total): 01 01 01 01 00 00 00 00 00 00 01 02 03 04 05 
    Codes of length 13 bits (012 total): 06 07 08 09 0A 0B 10 00 02 01 03 03 
    Codes of length 14 bits (009 total): 02 04 03 05 05 04 04 00 00 
    Codes of length 15 bits (010 total): 01 7D 01 02 03 00 04 11 05 12 
    Codes of length 16 bits (014 total): 21 31 41 06 13 51 61 07 22 71 14 32 81 91 
    Total number of codes: 131

  ----
  Destination ID = 1
  Class = 10 (AC Table)
ERROR: Invalid DHT Class (10). Aborting DHT Load.

ERROR: Expected marker 0xFF, got 0x73 @ offset 0x0000026C. Consider using [Tools->Img Search Fwd/Rev].

*** Searching Compression Signatures ***

  Signature:           01FF5BA518B453CC8F224A4C85505196
  Signature (Rotated): 01D13AFD01FF0B6EC46EA4081D25BB4D
  File Offset:         0 bytes
  Chroma subsampling:  2x1
  EXIF Make/Model:     NONE
  EXIF Makernotes:     NONE
  EXIF Software:       NONE

  Searching Compression Signatures: (3347 built-in, 0 user(*) )

          EXIF.Make / Software        EXIF.Model                            Quality           Subsamp Match?
          -------------------------   -----------------------------------   ----------------  --------------
     CAM:[NIKON                    ] [NIKON D40                          ] [FINE            ] Yes              

  Based on the analysis of compression characteristics and EXIF metadata:

  ASSESSMENT: Class 1 - Image is processed/edited

  This may be a new software editor for the database.
  If this file is processed, and editor doesn't appear in list above,
  PLEASE ADD TO DATABASE with [Tools->Add Camera to DB]


*** Additional Info ***
NOTE: Data exists after EOF, range: 0x00000000-0x0000BA90 (47760 bytes)

作为最后的说明,JPEGSnoop识别出来的{{EXIF.Model}}是错误的。这张照片应该是用{{VC0706 UART Model: LCF - 23T 0V528}}拍摄的。

总结一下:这个JPEG文件能恢复吗?


1
如果您可以恢复,则可以恢复。我会调查这是如何损坏的。位翻转错误?截断?缺少字节?一旦您确定了问题,可以尝试修复它。 - tadman
我已经尝试了一款专业的软件,但它无法恢复该文件。因此,当一款专门用于此项任务的软件都无法完成时,我怀疑您能否轻松地进行恢复。 - Tarun Lalwani
1个回答

18

这种方法取回数据更多是靠运气而非判断。我认为我可以解释一下,不过需要使用十六进制编辑器......JPEG文件语法的维基百科页面解释了它由一系列片段组成,每个片段以两个字节标记-0xFF和另一个字节表示片段类型开始。希望的是,文件中有问题的只是Huffman表段,正如错误消息所建议的那样。无需理解Huffman表的含义,只需在同一维基百科页面上看到相同部分即可得知其是用于Huffman表段的0xFF 0xC4标记。页面底部还提到:JPEG标准提供通用的Huffman表;编码器也可以选择生成Huffman表......打开其他几个JPEG文件,发现了一组标准的连续四个Huffman表段,每个段都以0xFF 0xC4标记开头。然而,样本corrupt.jpg只有一个Huffman表-从下面的位置0x00c80x02bc

这两个文件的霍夫曼表中都包含你提到的&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz序列。在损坏的文件中,它在单个霍夫曼表中出现了两次,在“更常规”的JPEG文件中,它出现在第二和第四个霍夫曼表中。

从那里开始,修复后的图像是标准的4个霍夫曼表的复制和粘贴,代替了corrupt.jpg中的字节范围——现在是从0x00c80x0278

因为JPEG格式基于扫描那些0xff标记之间的段落,所以您可以只交换霍夫曼段落——不需要担心文件中的其他指针。正如您所说,文件的其余部分看起来像一个合理的JPEG文件。


执行的步骤概述:

  • corrupt.jpg 进行十六进制搜索,寻找 FF C4 并记录偏移量
  • 继续进行十六进制搜索,直到找到下一个 FF。如果它是另一个 FF C4(也就是第二个哈夫曼表),则继续搜索
  • 删除从第一个 FF C4(包括)到下一个 FF(不包括)之间的内容
  • 用“标准的四个哈夫曼表”替换它。这些字节在最后一个示例中,或者可以从固定文件中的 0x00c80x0278 处复制

损坏的哈夫曼表:

0000-00d0:  xx xx xx xx xx xx xx xx-ff c4 01 a2-00 00 01 05  !....... ........
0000-00e0:  01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02  ........ ........
0000-00f0:  03 04 05 06-07 08 09 0a-0b 01 00 03-01 01 01 01  ........ ........
0000-0100:  0c 10 0d 0b-0c 0f 0c 09-0a 0e 13 0e-0f 10 11 12  ........ ........
0000-0110:  12 12 0b 0d-13 15 13 11-15 10 11 12-11 01 03 03  ........ ........
0000-0120:  03 04 04 04-08 04 04 08-11 0b 0a 0b-11 11 11 11  ........ ........
0000-0130:  11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11  ........ ........
0000-0140:  11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11  ........ ........
0000-0150:  01 01 01 01-01 00 00 00-00 00 00 01-02 03 04 05  ........ ........
0000-0160:  06 07 08 09-0a 0b 10 00-02 01 03 03-02 04 03 05  ........ ........
0000-0170:  05 04 04 00-00 01 7d 01-02 03 00 04-11 05 12 21  ......}. .......!
0000-0180:  31 41 06 13-51 61 07 22-71 14 32 81-91 a1 08 23  1A..Qa." q.2....#
0000-0190:  42 b1 c1 15-52 d1 f0 24-33 62 72 82-09 0a 16 17  B...R..$ 3br.....
0000-01a0:  18 19 1a 25-26 27 28 29-2a 34 35 36-37 38 39 3a  ...%&'() *456789:
0000-01b0:  43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a  CDEFGHIJ STUVWXYZ
0000-01c0:  63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a  cdefghij stuvwxyz
0000-01d0:  83 84 85 86-87 88 89 8a-92 93 94 95-96 97 98 99  ........ ........
0000-01e0:  9a a2 a3 a4-a5 a6 a7 a8-a9 aa b2 b3-b4 b5 b6 b7  ........ ........
0000-01f0:  b8 b9 ba c2-c3 c4 c5 c6-c7 c8 c9 ca-d2 d3 d4 d5  ........ ........
0000-0200:  d6 d7 d8 d9-da e1 e2 e3-e4 e5 e6 e7-e8 e9 ea f1  ........ ........
0000-0210:  f2 f3 f4 f5-f6 f7 f8 f9-fa 11 00 02-01 02 04 04  ........ ........
0000-0220:  03 04 07 05-04 04 00 01-02 77 00 01-02 03 11 04  ........ .w......
0000-0230:  05 21 31 06-12 41 51 07-61 71 13 22-32 81 08 14  .!1..AQ. aq."2...
0000-0240:  42 91 a1 b1-c1 09 23 33-52 f0 15 62-72 d1 0a 16  B.....#3 R..br...
0000-0250:  24 34 e1 25-f1 17 18 19-1a 26 27 28-29 2a 35 36  $4.%.... .&'()*56
0000-0260:  37 38 39 3a-43 44 45 46-47 48 49 4a-53 54 55 56  789:CDEF GHIJSTUV
0000-0270:  57 58 59 5a-63 64 65 66-67 68 69 6a-73 74 75 76  WXYZcdef ghijstuv
0000-0280:  77 78 79 7a-82 83 84 85-86 87 88 89-8a 92 93 94  wxyz.... ........
0000-0290:  95 96 97 98-99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2  ........ ........
0000-02a0:  b3 b4 b5 b6-b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9  ........ ........
0000-02b0:  ca d2 d3 d4-d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7  ........ ........
0000-02c0:  e8 e9 ea f2-f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx  ........ ........

接下来的两个字节是 ff dd,表示下一个段落的开始:

0000-02c0:  xx xx xx xx-xx xx xx xx-xx xx xx xx-ff dd 00 04  ........ ........

这已被标准的4个通用霍夫曼表所取代 - 寻找ff c4标记:

0000-00d0:  xx xx xx xx xx xx xx xx-ff c4 00 1f-00 00 01 05  !....... ........
0000-00e0:  01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02  ........ ........
0000-00f0:  03 04 05 06-07 08 09 0a-0b ff c4 00-b5 10 00 02  ........ ........
0000-0100:  01 03 03 02-04 03 05 05-04 04 00 00-01 7d 01 02  ........ .....}..
0000-0110:  03 00 04 11-05 12 21 31-41 06 13 51-61 07 22 71  ......!1 A..Qa."q
0000-0120:  14 32 81 91-a1 08 23 42-b1 c1 15 52-d1 f0 24 33  .2....#B ...R..$3
0000-0130:  62 72 82 09-0a 16 17 18-19 1a 25 26-27 28 29 2a  br...... ..%&'()*
0000-0140:  34 35 36 37-38 39 3a 43-44 45 46 47-48 49 4a 53  456789:C DEFGHIJS
0000-0150:  54 55 56 57-58 59 5a 63-64 65 66 67-68 69 6a 73  TUVWXYZc defghijs
0000-0160:  74 75 76 77-78 79 7a 83-84 85 86 87-88 89 8a 92  tuvwxyz. ........
0000-0170:  93 94 95 96-97 98 99 9a-a2 a3 a4 a5-a6 a7 a8 a9  ........ ........
0000-0180:  aa b2 b3 b4-b5 b6 b7 b8-b9 ba c2 c3-c4 c5 c6 c7  ........ ........
0000-0190:  c8 c9 ca d2-d3 d4 d5 d6-d7 d8 d9 da-e1 e2 e3 e4  ........ ........
0000-01a0:  e5 e6 e7 e8-e9 ea f1 f2-f3 f4 f5 f6-f7 f8 f9 fa  ........ ........
0000-01b0:  ff c4 00 1f-01 00 03 01-01 01 01 01-01 01 01 01  ........ ........
0000-01c0:  00 00 00 00-00 00 01 02-03 04 05 06-07 08 09 0a  ........ ........
0000-01d0:  0b ff c4 00-b5 11 00 02-01 02 04 04-03 04 07 05  ........ ........
0000-01e0:  04 04 00 01-02 77 00 01-02 03 11 04-05 21 31 06  .....w.. .....!1.
0000-01f0:  12 41 51 07-61 71 13 22-32 81 08 14-42 91 a1 b1  .AQ.aq." 2...B...
0000-0200:  c1 09 23 33-52 f0 15 62-72 d1 0a 16-24 34 e1 25  ..#3R..b r...$4.%
0000-0210:  f1 17 18 19-1a 26 27 28-29 2a 35 36-37 38 39 3a  .....&'( )*56789:
0000-0220:  43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a  CDEFGHIJ STUVWXYZ
0000-0230:  63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a  cdefghij stuvwxyz
0000-0240:  82 83 84 85-86 87 88 89-8a 92 93 94-95 96 97 98  ........ ........
0000-0250:  99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2-b3 b4 b5 b6  ........ ........
0000-0260:  b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9-ca d2 d3 d4  ........ ........
0000-0270:  d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7-e8 e9 ea f2  ........ ........
0000-0280:  f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx xx xx xx xx  ........ .....(..

这是非常好的消息,谢谢。据我所知,霍夫曼表是解压图像的关键,因此令人惊讶的是备用表竟然兼容。希望我们能在这个主题上听到更多关于这种情况的信息。 - df778899

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接