PHP DOMDocument去除HTML标签

Question

PHP DOMDocument去除HTML标签

5

我正在开发一个小型模板引擎，使用DOMDocument解析页面。目前我的测试页面如下：

<block name="content">

   <?php echo 'this is some rendered PHP! <br />' ?>

   <p>Main column of <span>content</span></p>

</block>

我的课程的一部分看起来像这样：

private function parse($tag, $attr = 'name')
{
    $strict = 0;
    /*** the array to return ***/
    $out = array();
    if($this->totalBlocks() > 0)
    {
        /*** a new dom object ***/
        $dom = new domDocument;
        /*** discard white space ***/
        $dom->preserveWhiteSpace = false;

        /*** load the html into the object ***/
        if($strict==1)
        {
            $dom->loadXML($this->file_contents);
        }
        else
        {
            $dom->loadHTML($this->file_contents);
        }

        /*** the tag by its tag name ***/
        $content = $dom->getElementsByTagname($tag);

        $i = 0;
        foreach ($content as $item)
        {
            /*** add node value to the out array ***/
            $out[$i]['name'] = $item->getAttribute($attr);
            $out[$i]['value'] = $item->nodeValue;
            $i++;
        }
    }

    return $out;
}

我已经让它按照我的意愿工作了，即它会抓取页面上的每个<block>并将其内容注入到我的模板中，但是它会剥离<block>内部的HTML标签，从而返回以下结果，不包含<p>或<span>标签：

this is some rendered PHP! Main column of content

我在这里做错了什么？:) 谢谢

- Brian Litzinger

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- dannyp · Accepted Answer

注意：nodeValue是树中值部分的连接，永远不会有标签。

要将$node下的树制作为HTML片段，我会这样做：


$doc = new DOMDocument();
foreach($node->childNodes as $child) {
    $doc->appendChild($doc->importNode($child, true));
}
return $doc->saveHTML();

HTML“片段”实际上比你最初想象的更具问题，因为它们往往缺乏诸如文档类型和字符集之类的东西，这使得在DOM树和HTML片段之间确定性地来回转换变得困难。