如何使用PHP解析带有冒号标签的XML节点

Question

如何使用PHP解析带有冒号标签的XML节点

4

我正在尝试从[这个URL（加载时间较长）][1]中获取以下节点的值。我感兴趣的元素是：

title, g:price and g:gtin

XML 的起始部分如下所示：

<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
  <channel>
    <title>PhotoSpecialist.de</title>
    <link>http://www.photospecialist.de</link>
    <description/>
    <item>
      <g:id>BEN107C</g:id>
      <title>Benbo Trekker Mk3 + Kugelkopf + Tasche</title>
      <description>
        Benbo Trekker Mk3 + Kugelkopf + Tasche Das Benbo Trekker Mk3 ist eine leichte Variante des beliebten Benbo 1. Sein geringes Gewicht macht das Trekker Mk3 zum idealen Stativ, wenn Sie viel draußen fotografieren und viel unterwegs sind. Sollten Sie in eine Situation kommen, in der maximale Stabilität zählt, verfügt das Benbo Trekker Mk3 über einen Haken an der Mittelsäule. An diesem können Sie das Stativ mit zusätzlichem Gewicht bei Bedarf beschweren. Dank der zwei besonderen Kamera-Befestigungsschrauben können Sie mit dem Benbo Trekker Mk3 sehr nah am Boden fotografieren. So nah, dass in vielen Fällen die einzige Einschränkung die Größe Ihrer Kamera darstellt. In diesem Set erhalten Sie das Benbo Trekker Mk3 zusammen mit einem Kugelkopf, Socket und einer Tasche für den sicheren und komfortablen Transport.
      </description>
      <link>
        http://www.photospecialist.de/benbo-trekker-mk3-kugelkopf-tasche?dfw_tracker=2469-16
      </link>
      <g:image_link>http://static.fotokonijnenberg.nl/media/catalog/product/b/e/benbo_trekker_mk3_tripod_kit_with_b__s_head__bag_ben107c1.jpg</g:image_link>
      <g:price>199.00 EUR</g:price>
      <g:condition>new</g:condition>
      <g:availability>in stock</g:availability>
      <g:identifier_exists>TRUE</g:identifier_exists>
      <g:brand>Benbo</g:brand>
      <g:gtin>5022361100576</g:gtin>
      <g:item_group_id>0</g:item_group_id>
      <g:product_type>Tripod</g:product_type>
      <g:mpn/>
      <g:google_product_category>Kameras & Optik</g:google_product_category>
    </item>
  ...
  </channel>
</rss>

为了实现这个目标，我编写了以下代码：

$z = new XMLReader;
$z->open('https://my.datafeedwatch.com/static/files/1248/8222ebd3847fbfdc119abc9ba9d562b2cdb95818.xml');

$doc = new DOMDocument;

while ($z->read() && $z->name !== 'item')
    ;

while ($z->name === 'item')
{
    $node = new SimpleXMLElement($z->readOuterXML());
    $a = $node->title;
    $b = $node->price;
    $c = $node->gtin;
    echo $a . $b . $c . "<br />";
    $z->next('item');
}

这只返回给我标题...价格和gtin没有显示。

- user3305327

我的错，你正在使用SimpleXMLElement来访问其自己命名空间的属性。因此，链接的重复并不完全正确（你可以使用XMLReader::expand()直接获取DOMElement，通过dom_import_simplexml将其转换为DOM，或者像在这个评论中链接的Q&A中那样直接访问SimpleXML中的命名空间属性）。 - hakre

@hakre...我无法使用SimpleXML，因为XML文件很大，所以必须使用XMLReader。 - user3305327

什么？你在你的问题代码中实际上使用了SimpleXML。我提到它时并没有说要切换XMLReader。 - hakre

@hakre... 哎呀，抱歉...我对 XML 编码非常陌生...顺便问一下，你能帮我解决这个问题吗？ - user3305327

2个回答

0

如果主标签是带有冒号的字符串，您必须使用：

$xml->next($xml->localName);

移动到下一个元素项。

- revoke

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- hakre · Accepted Answer

你所询问的元素并不属于默认命名空间，而是属于另一个命名空间。你可以看到它们的名称中有一个由冒号分隔的前缀:

  ...
  <channel>
    <title>PhotoSpecialist.de</title>
    <!-- title is in the default namespace, no colon in the name -->
    ...
    <g:price>199.00 EUR</g:price>
    ...
    <g:gtin>5022361100576</g:gtin>
    <!-- price and gtin are in a different namespace, colon in the name and prefixed by "g" -->
  ...

命名空间由前缀指定，在您的情况下为"g"。并且该前缀所代表的命名空间定义在文档元素中，例如：

<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">

所以该命名空间为 "http://base.google.com/ns/1.0"。当您使用 SimpleXMLElement 访问子元素时，按名称访问它们：

$a = $node->title;
$b = $node->price;
$c = $node->gtin;

你只查找了默认命名空间。因此只有第一个元素实际包含文本，而其他两个是即时创建的，并且尚未包含任何内容。

要访问具有命名空间的子元素，您需要使用 children() 方法明确告诉 SimpleXMLElement。它会创建一个新的 SimpleXMLElement，其中包含该命名空间中的所有子元素，而不是默认命名空间：

$google = $node->children("http://base.google.com/ns/1.0");

$a = $node->title;
$b = $google->price;
$c = $google->gtin;

这就是关于孤立的例子的全部内容（没错，就是这样）。

一个完整的例子可能看起来像这样（包括对读取器进行节点扩展，你之前的代码有些陈旧）：

<?php
/**
 * How to parse an XML node with a colon tag using PHP
 *
 * @link https://dev59.com/k4rda4cB1Zd3GeqPIidV
 */
const HTTP_BASE_GOOGLE_COM_NS_1_0 = "http://base.google.com/ns/1.0";

$url = 'https://my.datafeedwatch.com/static/files/1248/8222ebd3847fbfdc119abc9ba9d562b2cdb95818.xml';

$reader = new XMLReader;
$reader->open($url);

$doc = new DOMDocument;

// move to first item element
while (($valid = $reader->read()) && $reader->name !== 'item') ;

while ($valid) {
    $default    = simplexml_import_dom($reader->expand($doc));
    $googleBase = $default->children(HTTP_BASE_GOOGLE_COM_NS_1_0);
    printf(
        "%s - %s - %s<br />\n"
        , htmlspecialchars($default->title)
        , htmlspecialchars($googleBase->price)
        , htmlspecialchars($googleBase->gtin)
    );

    // move to next item element
    $valid = $reader->next('item');
};

我希望这既能解释，又能扩展一下对于XMLReader的使用的认识。