我目前正在处理一个网站的网站地图,并使用SimpleXML导入并对原始XML文件进行一些检查。之后,我使用simplexml_load_file("small.xml");
将其转换为DOMDocument,以便更容易地精确添加和操作XML元素。以下是我正在使用的测试XML网站地图:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:52:32-Orouke.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:23-castle technology.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:38-banana split.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:42-Waveney.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:55:12-pure orange.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:57:54-tau press.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:21-E.f.m.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:31-apple.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:45-townhouse communications.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
</urlset>
现在,这是我正在使用的测试代码进行修改的代码段:
<?php
$root = simplexml_load_file("small.xml");
$domRoot = dom_import_simplexml($root);
$dom = $domRoot->ownerDocument;
$urlElement = $dom->createElement("url");
$locElement = $dom->createElement("loc");
$locElement->appendChild($dom->createTextNode("www.google.co.uk"));
$urlElement->appendChild($locElement);
$lastmodElement = $dom->createElement("lastmod");
$lastmodElement->appendChild($dom->createTextNode("2011-08-02"));
$urlElement->appendChild($lastmodElement);
$domRoot->appendChild($urlElement);
$dom->formatOutput = true;
echo $dom->saveXML();
?>
主要问题在于,无论我把
$dom->formatOutput = true;
放在哪里,从SimpleXML导入的现有XML都会正确地格式化,但任何新内容都会以“一行全部”样式格式化,如下所示:<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:52:32-Orouke.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:23-castle technology.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:38-banana split.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:42-Waveney.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:55:12-pure orange.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:57:54-tau press.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:21-E.f.m.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:31-apple.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:45-townhouse communications.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url><loc>www.google.co.uk</loc><lastmod>2011-08-02</lastmod></url></urlset>
如果有人知道为什么会出现这种情况,并且知道如何解决,我将非常感激。
formatOutput
和preserveWhiteSpace
标志。问题是,我正在将预加载的SimpleXML对象转换为DOMDocument,因此它继承了该对象中保留的所有空格等内容,我只是想找出是否有可能告诉SimpleXML在加载文档时不要格式化输出或保留空格,这样一旦我转换它,就可以向DOMDocument传递“干净”的XML节点集。 - Tom Busby