PHP内置了读取EXIF和IPTC元数据的支持,但我找不到任何读取XMP的方法?
XMP数据实际上被嵌入到图像文件中,因此可以使用PHP的字符串函数从图像文件本身中提取它。
以下演示了这个过程(我在使用SimpleXML,但是任何其他XML API甚至简单而巧妙的字符串解析都可以给出相同的结果):
$content = file_get_contents($image);
$xmp_data_start = strpos($content, '<x:xmpmeta');
$xmp_data_end = strpos($content, '</x:xmpmeta>');
$xmp_length = $xmp_data_end - $xmp_data_start;
$xmp_data = substr($content, $xmp_data_start, $xmp_length + 12);
$xmp = simplexml_load_string($xmp_data);
仅有两点需要注意:
file_get_contents()
函数,因为该函数会将整个图像加载到内存中。使用fopen()
打开文件流资源,并检查数据块中的关键序列<x:xmpmeta
和</x:xmpmeta>
将极大地减少内存占用。我花了很长时间才回复这个问题,因为在谷歌搜索如何解析XMP数据时,这似乎是最好的结果。我曾多次看到几乎完全相同的代码片段被使用,但它浪费了很多内存。以下是Stefan在他的示例后提到的fopen()方法的示例。
<?php
function getXmpData($filename, $chunkSize)
{
if (!is_int($chunkSize)) {
throw new RuntimeException('Expected integer value for argument #2 (chunkSize)');
}
if ($chunkSize < 12) {
throw new RuntimeException('Chunk size cannot be less than 12 argument #2 (chunkSize)');
}
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$startTag = '<x:xmpmeta';
$endTag = '</x:xmpmeta>';
$buffer = NULL;
$hasXmp = FALSE;
while (($chunk = fread($file_pointer, $chunkSize)) !== FALSE) {
if ($chunk === "") {
break;
}
$buffer .= $chunk;
$startPosition = strpos($buffer, $startTag);
$endPosition = strpos($buffer, $endTag);
if ($startPosition !== FALSE && $endPosition !== FALSE) {
$buffer = substr($buffer, $startPosition, $endPosition - $startPosition + 12);
$hasXmp = TRUE;
break;
} elseif ($startPosition !== FALSE) {
$buffer = substr($buffer, $startPosition);
$hasXmp = TRUE;
} elseif (strlen($buffer) > (strlen($startTag) * 2)) {
$buffer = substr($buffer, strlen($startTag));
}
}
fclose($file_pointer);
return ($hasXmp) ? $buffer : NULL;
}
$ exiv2 -e X extract image.jpg
将产生包含嵌入式XMP的image.xmp文件,现在可以解析该文件。
我知道...这个帖子有点老了,但它在我寻找解决方案时对我很有帮助,所以我觉得这可能对其他人也有帮助。
我采用了这个基本解决方案,并对它进行了修改,使它能够处理标签跨越多个块的情况。这允许你将块大小设置得更大或更小。
<?php
function getXmpData($filename, $chunk_size = 1024)
{
if (!is_int($chunkSize)) {
throw new RuntimeException('Expected integer value for argument #2 (chunkSize)');
}
if ($chunkSize < 12) {
throw new RuntimeException('Chunk size cannot be less than 12 argument #2 (chunkSize)');
}
if (($file_pointer = fopen($filename, 'rb')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$tag = '<x:xmpmeta';
$buffer = false;
// find open tag
while ($buffer === false && ($chunk = fread($file_pointer, $chunk_size)) !== false) {
if(strlen($chunk) <= 10) {
break;
}
if(($position = strpos($chunk, $tag)) === false) {
// if open tag not found, back up just in case the open tag is on the split.
fseek($file_pointer, -10, SEEK_CUR);
} else {
$buffer = substr($chunk, $position);
}
}
if($buffer === false) {
fclose($file_pointer);
return false;
}
$tag = '</x:xmpmeta>';
$offset = 0;
while (($position = strpos($buffer, $tag, $offset)) === false && ($chunk = fread($file_pointer, $chunk_size)) !== FALSE && !empty($chunk)) {
$offset = strlen($buffer) - 12; // subtract the tag size just in case it's split between chunks.
$buffer .= $chunk;
}
fclose($file_pointer);
if($position === false) {
// this would mean the open tag was found, but the close tag was not. Maybe file corruption?
throw new RuntimeException('No close tag found. Possibly corrupted file.');
} else {
$buffer = substr($buffer, 0, $position + 12);
}
return $buffer;
}
?>
Bryan的解决方案到目前为止是最好的,但它有一些问题,所以我对其进行了修改,简化了它,并删除了一些功能。
我发现他的解决方案有三个问题:
A)如果提取的块恰好落在我们正在搜索的字符串中间,它将找不到。较小的块大小更容易引起此问题。
B)如果块同时包含开始和结束,它将找不到。可以通过额外的if语句重新检查找到开始的块来解决这个问题,以查看是否还找到结束。
C)在else语句中添加的终止while循环的语句如果没有找到xmp数据会产生副作用,即如果第一次无法找到开始元素,则不会再检查任何块。这很容易修复,但由于第一个问题,这不值得。
我下面的解决方案不够强大,但更加稳健。它只会检查一个块,并从中提取数据。它只适用于开始和结束在该块中的情况,因此块大小需要足够大以确保始终捕获该数据。根据我使用Adobe Photoshop / Lightroom导出文件的经验,xmp数据通常从约20kB开始,到约45kB结束。对于我的图像,我的50k块大小似乎很好用,如果你剥离了一些导出的数据,例如具有许多开发设置的CRS块,则会少得多。
function getXmpData($filename)
{
$chunk_size = 50000;
$buffer = NULL;
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$chunk = fread($file_pointer, $chunk_size);
if (($posStart = strpos($chunk, '<x:xmpmeta')) !== FALSE) {
$buffer = substr($chunk, $posStart);
$posEnd = strpos($buffer, '</x:xmpmeta>');
$buffer = substr($buffer, 0, $posEnd + 12);
}
fclose($file_pointer);
return $buffer;
}
function getXmpData($filename, $chunk_size = 50000){
$buffer = NULL;
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$chunk = fread($file_pointer, $chunk_size);
if (($posStart = strpos($chunk, '<x:xmpmeta')) !== FALSE) {
$buffer = substr($chunk, $posStart);
$posEnd = strpos($buffer, '</x:xmpmeta>');
$buffer = substr($buffer, 0, $posEnd + 12);
}
fclose($file_pointer);
// recursion here
if(!strpos($buffer, '</x:xmpmeta>')){
$buffer = getXmpData($filename, $chunk_size*2);
}
return $buffer;
}
-xmp:all
)并以JSON格式输出(-json
),然后您可以轻松地将其转换为PHP对象:$command = 'exiftool -g -json -struct -xmp:all "'.$image_path.'"';
exec($command, $output, $return_var);
$metadata = implode('', $output);
$metadata = json_decode($metadata);
如果您能够在您的环境中安装exiv2:
sudo apt install exiv2
function image_meta_data($image_path) {
$meta_data = [];
// execute exiv2 via the command line
exec('exiv2 -Pkt ' . $image_path, $output = null, $retval = null);
// process output into associative array
foreach ($output as $line) {
$key = trim(substr($line, 0, 46));
$value = str_replace('lang="x-default" ', '', trim(substr($line, 46))); // remove in-line language tag
$meta_data[$key] = $value;
}
return $meta_data;
}
使用方法:
$meta = image_meta_data($image_path);
print_r($meta);
// Examples:
echo $meta['Xmp.dc.title'] ?? '';
echo $meta['Iptc.Application2.DateCreated'] ?? '';
echo $meta['Exif.Image.ImageDescription'] ?? '';
现在还有一个可以通过composer添加的github仓库,可以读取xmp数据:
https://github.com/jeroendesloovere/xmp-metadata-extractor
composer require jeroendesloovere/xmp-metadata-extractor
(find <x:xmpmeta, find next </x:xmpmeta>) repeat until no more <x:xmpmeta can be found
。 - Stefan Gehrig