从JPEG二进制数据中获取图像大小

13

我有很多大小不同的jpeg文件。例如,这里是一个大小为256 * 384(像素)的图像的hexdump给出的前640个字节:

0000000: ffd8 ffe0 0010 4a46 4946 0001 0101 0048  ......JFIF.....H
0000010: 0048 0000 ffdb 0043 0003 0202 0302 0203  .H.....C........
0000020: 0303 0304 0303 0405 0805 0504 0405 0a07  ................
0000030: 0706 080c 0a0c 0c0b 0a0b 0b0d 0e12 100d  ................

我猜测大小信息必须在这些行中。但是我无法看到哪些字节正确地提供了大小。有谁能帮助我找到包含大小信息的字段吗?

8个回答

13
根据维基百科的JPEG页面中的语法和结构部分,图像的宽度和高度似乎并没有存储在图像本身中,或者至少不是很容易找到的方式。
然而,引用自JPEG图像压缩FAQ,第1/2部分
主题:[22]我的程序如何从JPEG文件中提取图像尺寸?
JPEG文件的头部由一系列块组成,称为“标记”。图像高度和宽度存储在类型为SOFn(帧开始,类型N)的标记中。要找到SOFn,您必须跳过前面的标记;您不需要知道其他类型的标记中有什么,只需使用它们的长度字来跳过它们。最少所需逻辑可能是一页C代码。 (一些人建议只搜索表示SOFn的字节对,而不关注标记块结构。这是不安全的,因为先前的标记可能包含SOFn模式,无论是偶然还是因为它包含一个JPEG压缩的缩略图像。如果您不遵循标记结构,您将检索缩略图像的大小而不是主图像大小。)在IJG发行版的rdjpgcom.c中可以找到用大量注释的C示例(请参见第2部分,项目15)。Perl代码可以在http://www.tardis.ed.ac.uk/~ark/wwwis/中找到。
(呃,那个链接似乎失效了...)


这里有一部分C代码可能对你有所帮助:解码JPEG(JFIF)文件的宽度和高度


如果是这种情况,Nautilus或其他图像查看器如何决定图像的分辨率?它们似乎都同意该图像的值为256 * 384。 - rajeshsr
1
非常感谢!我现在已经明白了。似乎 greping 0xFFC0 可以解决问题,但我也理解其中存在的危险性!再次感谢!顺便说一句,这是我在 stackoverflow 的第一篇帖子!对回复的快速和准确感到非常惊讶。谢谢大家! - rajeshsr
1
最后两个链接已经失效。 - Millie Smith
1
最后两个链接仍然无法访问。 - Bovaz
2
以下是最后一个链接的存档:https://web.archive.org/web/20131016210645/http://www.64lines.com/jpeg-width-height - Louis Hong

9

这个函数将读取JPEG属性。

function jpegProps(data) {          // data is an array of bytes
    var off = 0;
    while(off<data.length) {
        while(data[off]==0xff) off++;
        var mrkr = data[off];  off++;
        
        if(mrkr==0xd8) continue;    // SOI
        if(mrkr==0xd9) break;       // EOI
        if(0xd0<=mrkr && mrkr<=0xd7) continue;
        if(mrkr==0x01) continue;    // TEM
        
        var len = (data[off]<<8) | data[off+1];  off+=2;  
        
        if(mrkr==0xc0) return {
            bpc : data[off],     // precission (bits per channel)
            h   : (data[off+1]<<8) | data[off+2],
            w   : (data[off+3]<<8) | data[off+4],
            cps : data[off+5]    // number of color components
        }
        off+=len-2;
    }
}

 


2
简短而完美的解决方案。 - Farshad Mohajeri

2

我已经将顶部答案中的 CPP 代码转换为 Python 脚本。

"""
Source: https://dev59.com/y3E85IYBdhLWcg3w_Itr:~:text=The%20header%20of%20a%20JPEG,Of%20Frame%2C%20type%20N).
"""
def get_jpeg_size(data):
   """
   Gets the JPEG size from the array of data passed to the function, file reference: http:#www.obrador.com/essentialjpeg/headerinfo.htm
   """
   data_size=len(data)
   #Check for valid JPEG image
   i=0   # Keeps track of the position within the file
   if(data[i] == 0xFF and data[i+1] == 0xD8 and data[i+2] == 0xFF and data[i+3] == 0xE0): 
   # Check for valid JPEG header (null terminated JFIF)
      i += 4
      if(data[i+2] == ord('J') and data[i+3] == ord('F') and data[i+4] == ord('I') and data[i+5] == ord('F') and data[i+6] == 0x00):
         #Retrieve the block length of the first block since the first block will not contain the size of file
         block_length = data[i] * 256 + data[i+1]
         while (i<data_size):
            i+=block_length               #Increase the file index to get to the next block
            if(i >= data_size): return False;   #Check to protect against segmentation faults
            if(data[i] != 0xFF): return False;   #Check that we are truly at the start of another block
            if(data[i+1] == 0xC0):          #0xFFC0 is the "Start of frame" marker which contains the file size
               #The structure of the 0xFFC0 block is quite simple [0xFFC0][ushort length][uchar precision][ushort x][ushort y]
               height = data[i+5]*256 + data[i+6];
               width = data[i+7]*256 + data[i+8];
               return height, width
            else:
               i+=2;                              #Skip the block marker
               block_length = data[i] * 256 + data[i+1]   #Go to the next block
         return False                   #If this point is reached then no size was found
      else:
         return False                  #Not a valid JFIF string
   else:
      return False                     #Not a valid SOI header




with open('path/to/file.jpg','rb') as handle:
   data = handle.read()

h, w = get_jpeg_size(data)
print(s)

1

以下是我使用js实现的方法。您要查找的标记是Sofn标记,伪代码基本上是:

  • 从第一个字节开始
  • 段的开头总是以FF开头,后跟另一个字节表示标记类型(这两个字节称为标记)
  • 如果该字节为01D1D9,则该段中没有数据,请继续下一段
  • 如果该标记为C0C2(或任何其他Cn,有关详细信息,请参见代码注释),那么就是您要查找的Sofn标记
    • 标记后面的字节将依次为P(1字节)、L(2字节)、高度(2字节)、宽度(2字节)
  • 否则,紧随其后的两个字节将是长度属性(整个段的长度,不包括标记,2个字节),请使用它跳过到下一个段
  • 重复直到找到Sofn标记
function getJpgSize(hexArr) {
  let i = 0;
  let marker = '';

  while (i < hexArr.length) {
    //ff always start a marker,
    //something's really wrong if the first btye isn't ff
    if (hexArr[i] !== 'ff') {
      console.log(i);
      throw new Error('aaaaaaa');
    }

    //get the second byte of the marker, which indicates the marker type
    marker = hexArr[++i];

    //these are segments that don't have any data stored in it, thus only 2 bytes
    //01 and D1 through D9
    if (marker === '01' || (!isNaN(parseInt(marker[1])) && marker[0] === 'd')) {
      i++;
      continue;
    }

    /*
    sofn marker: https://www.w3.org/Graphics/JPEG/itu-t81.pdf pg 36
      INFORMATION TECHNOLOGY –
      DIGITAL COMPRESSION AND CODING
      OF CONTINUOUS-TONE STILL IMAGES –
      REQUIREMENTS AND GUIDELINES

    basically, sofn (start of frame, type n) segment contains information
    about the characteristics of the jpg

    the marker is followed by:
      - Lf [frame header length], two bytes
      - P [sample precision], one byte
      - Y [number of lines in the src img], two bytes, which is essentially the height
      - X [number of samples per line], two bytes, which is essentially the width 
      ... [other parameters]

    sofn marker codes: https://www.digicamsoft.com/itu/itu-t81-36.html
    apparently there are other sofn markers but these two the most common ones
    */
    if (marker === 'c0' || marker === 'c2') {
      break;
    }
    //2 bytes specifying length of the segment (length excludes marker)
    //jumps to the next seg
    i += parseInt(hexArr.slice(i + 1, i + 3).join(''), 16) + 1;
  }
  const size = {
    height: parseInt(hexArr.slice(i + 4, i + 6).join(''), 16),
    width: parseInt(hexArr.slice(i + 6, i + 8).join(''), 16),
  };
  return size;
}

0

从论坛中的一个解决方案移植到Dart/Flutter。

class JpegProps {
  final int precision;

  final int height;

  final int width;

  final int compression;

  JpegProps._(this.precision, this.height, this.width, this.compression,);

  String toString() => 'JpegProps($precision,$height,$width,$compression)';

  static JpegProps readImage(Uint8List imageData) {
    // data is an array of bytes
    int offset = 0;
    while (offset < imageData.length) {
      while (imageData[offset] == 0xff) offset++;
      var mrkr = imageData[offset];
      offset++;

      if (mrkr == 0xd8) continue; // SOI
      if (mrkr == 0xd9) break; // EOI
      if (0xd0 <= mrkr && mrkr <= 0xd7) continue;
      if (mrkr == 0x01) continue; // TEM

      var length = (imageData[offset] << 8) | imageData[offset + 1];
      offset += 2;

      if (mrkr == 0xc0) {
        return JpegProps._(imageData[offset],
          (imageData[offset + 1] << 8) | imageData[offset + 2],
          (imageData[offset + 3] << 8) | imageData[offset + 4],
          imageData[offset + 5],
        );
      }
      offset += length - 2;
    }
    throw '';
  }
}

你的回答可以通过提供更多支持信息来改进。请编辑以添加进一步的细节,例如引用或文档,以便他人可以确认你的答案是正确的。您可以在帮助中心中找到有关如何编写良好答案的更多信息。 - Community

0

从 .jpg 图片中获取宽度和高度的简单方法。删除文件中的 EXIF 和 ITP 信息。使用一个查看图片程序的 "另存为" 功能(我使用的是 IrfanView 或 Pain Shop Pro)。在 "另存为" 中去掉 EXIF,然后保存文件。jpg 文件没有 EXIF 时,高度位于字节 000000a3 和 000000a4,宽度位于 000000a5 和 000000a6。

我使用 PHP。

function storrelse_jpg($billedfil)  //billedfil danish for picturefile
{
    //Adresse  for jpg fil without EXIF info !!!!!
    // width is in byte 165 til 166, heigh is in byte 163 og 164
    // jpg dimensions are with 2 bytes ( in png are the dimensions with 4 bytes

    $billedfil="../diashow/billeder/christiansdal_teltplads_1_x.jpg"; // the picturefil 

    $tekst=file_get_contents($billedfil,0,NULL,165,2); //Read from 165  2 bytes  - width
    $tekst1=file_get_contents($billedfil,0,NULL,163,2);//Read from  163  2 bytes - heigh
    $n=strlen($tekst); // længden af strengen
     
    echo "St&oslash;rrelse på billed : ".$billedfil. "<br>"; // Headline 

    $bredde=0; // width  
    $langde=0; // heigh
    for ($i=0;$i<$n;$i++)
    {
        $by=bin2hex($tekst[$i]); //width-byte from binær to hex 
        $bz=hexdec($by);// then from hex to decimal
        
        $ly=bin2hex($tekst1[$i]); // the same for length byte
        $lz=hexdec($ly);
        
        
        $bredde=$bredde+$bz*256**(1-$i);
        $langde=$langde+$lz*256**(1-$i);
    }
    // $x is a array $x[0] er width and $x[1] er heigh
    $x[0]=$bredde; $x[1]=$langde;
    
    return $x;
}

0
一个基于“原始”CPP转换的Python解决方案 - https://dev59.com/y3E85IYBdhLWcg3w_Itr#62245035
def get_jpeg_resolution(image_bytes: bytes,
                        size: int = None) -> Optional[Tuple[int, int]]:
    """
    function for getting resolution from binary
    :param image_bytes: image binary
    :param size: image_bytes len if value is None it'll calc inside
    :return: (width, height) or None if not found
    """
    size = len(image_bytes) if size is None else size

    header_bytes = (0xff, 0xD8, 0xff, 0xe0)

    if not (size > 11
            and header_bytes == struct.unpack_from('>4B', image_bytes)):
        # Incorrect header or minimal length
        return None

    jfif_bytes = tuple(ord(s) for s in 'JFIF') + (0x0, )

    if not (jfif_bytes == struct.unpack_from('5B', image_bytes, 6)):
        # Not a valid JFIF string
        return None

    index = len(header_bytes)
    block_length, = struct.unpack_from(">H", image_bytes, index)

    index += block_length

    while index < size:
        if image_bytes[index] != 0xFF:
            break
            # Check that we are truly at the start
            # of another block
        if image_bytes[index + 1] == 0xC0:
            # 0xFFC0 is the "Start of frame" marker
            # which contains the file size
            # The structure of the 0xFFC0 block is
            # quite simple
            # [0xFFC0][ushort length][uchar precision]
            #   [ushort x][ushort y]

            height, width = struct.unpack_from(">HH", image_bytes, index + 5)
            return width, height
        else:
            index += 2
            # Skip the block marker
            # Go to the next block
            block_length, = struct.unpack(">H",
                                          image_bytes[slice(index, index + 2)])
        # Increase the file index to get to the next block
        index += block_length

    # If this point is reached then no size was found
    return None

0
如果您正在使用Linux系统并且有PHP,那么这个PHP脚本的变体可能会产生您想要的结果:
#! /usr/bin/php -q
<?php

if (file_exists($argv[1]) ) {

    $targetfile = $argv[1];

    // get info on uploaded file residing in the /var/tmp directory:
    $safefile       = escapeshellcmd($targetfile);
    $getinfo        = `/usr/bin/identify $safefile`;
    $imginfo        = preg_split("/\s+/",$getinfo);
    $ftype          = strtolower($imginfo[1]);
    $fsize          = $imginfo[2];

    switch($fsize) {
        case 0:
            print "FAILED\n";
            break;
        default:
            print $safefile.'|'.$ftype.'|'.$fsize."|\n";
    }
}

// eof

主机> imageinfo 009140_DJI_0007.JPG

009140_DJI_0007.JPG|jpeg|4000x3000|

(以管道分隔的格式输出文件名、文件类型和文件尺寸)

来自 man 手册:

要了解有关“identify”命令的更多信息,请将您的浏览器指向[...] http://www.imagemagick.org/script/identify.php


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接