使用PHP从字符串中删除HTML元素

Question

使用PHP从字符串中删除HTML元素

4

我有些困惑，不知道如何处理这个问题。我有一个字符串看起来像这样...

    $text = "<p>This is some example text This is some example text This is some example text</p>
             <p><em>This is some example text This is some example text This is some example text</em></p>
             <p>This is some example text This is some example text This is some example text</p>";

我基本上想要使用类似 preg_replace 和正则表达式来删除

<em>This is some example text This is some example text This is some example text</em>

我需要编写一些PHP代码，用于搜索开头的和结束的标签，并删除中间的所有文本。

希望有人能帮助，谢谢。

- lukehillonline

这个字符串是否总是只包含一组标签？ - Nertim

那么还有空的元素吗？ - Gordon

是的，em标签始终存在，但是最终会得到一个空的标签，但这并不是问题。 - lukehillonline

5个回答

2

$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';

preg_match("#<em>(.+?)</em>#", $text, $output);

echo $output[0]; // This will output it with em style
echo '<br /><br />';
echo $output[1]; // This will output only the text between the em

^{[ 查看输出结果 ]}

为了让这个示例起作用，我稍微更改了标签中的内容，否则所有文本都是相同的，你无法真正理解脚本是否有效。

但是，如果你想去掉标签并且不获取其中的内容：

$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';

echo preg_replace("/<em>(.+)<\/em>/", "", $text);

^{[ View output ]}

- Kalle H. Väravas

注意：这个假设是在你的字符串中只有一个标签的情况下才有效。 - Kalle H. Väravas

我明白了，这段文本会去掉标签，但是它会留下什么？我不关心实际的文本，我想从字符串中删除它，只留下其余的文本。 - lukehillonline

@AdriftUniform，抱歉，我对你的问题有些误解。请看编辑后的内容，应该是你所问的。 - Kalle H. Väravas

如果你有多行HTML之类的东西，要小心。默认情况下，.+不能跨越换行符匹配。我花了大约一个小时才最终发现PCRE_DOTALL和/s修饰符。 - lkraav

非常有效...本来要使用php html dom类，但这个方法更简单，而且我需要的是能够通过id定位元素的方法...例如：echo preg_replace('/(.+)<\/em>/', "", $text); - greaterKing

2

如果您对非正则表达式的解决方案感兴趣，以下内容也适用：

<?php
    $text = "<p>This is some example text This is some example text This is some example text</p>
             <p><em>This is some example text This is some example text This is some example text</em></p>
             <p>This is some example text This is some example text This is some example text</p>";


    $emStartPos = strpos($text,"<em>");
    $emEndPos = strpos($text,"</em>");

    if ($emStartPos && $emEndPos) {
        $emEndPos += 5; //remove <em> tag aswell
        $len = $emEndPos - $emStartPos;

        $text = substr_replace($text, '', $emStartPos, $len);
    }

?>

这将删除标签之间的所有内容。

- Nertim

如果我稍微添加一些内容，像 preg_replace("", " ", $text) 和 preg_replace("", " ", $text)，那么这也会去掉 标签吗？ - lukehillonline

如果您不想保留标签，而是想要进行转换，请将$emStartPos += 4更改为$emEndPos += 5（''长度为5个字符）。 - Nertim

2

我无法相信，你选择了这个答案。有这么多代码，它不整洁，也不是最优的。 - Kalle H. Väravas

@KalleH.Väravas 我认为AdriftUniform决定使用这个，因为与正则表达式相比，它更容易阅读，特别是对于不熟悉正则表达式的人来说。我同意你的看法，正则表达式确实可以用一行代码解决这个问题。在后台，解释器仍然需要分析正则表达式，然后对文本执行操作，所以我不确定在这种特殊情况下正则表达式是否更优化？也许AdriftUniform可以对每个解决方案运行时间测试，并使用效率更高的那个，特别是如果他/她计划处理许多文本块。 - Nertim

1

使用strrpos函数先找到字符串中的第一个元素和最后一个元素。使用substr函数获取字符串的一部分。然后使用原始字符串中的空字符串替换子字符串。

- marko

为什么要那么复杂，如果可以用一个函数、一行代码、一次匹配来完成整个事情？！ - Kalle H. Väravas

@Kalle 正则表达式也很复杂。它们只是可以用一种非常简洁的方式编写。但解释器需要解析和翻译它们。你之所以看不到复杂性，是因为它发生在幕后。 - Gordon

HTML 无法使用正则表达式进行解析。那么包含字符 的注释或引用字符串呢？或者是 ……… 这样的情况。 - Peter Wooster

-4

  format: $text = str_replace('<em>','',$text);
$text = str_replace('</em>','',$text);

- Vikram Srivastava

OP不想去除所有标签，只是想去除标签和其中的内容。 - Gordon

如上所述，这不是我要找的内容，我只想删除并且我想摆脱strip_tags无法处理的文本。 - lukehillonline

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- arychj · Accepted Answer

4

$text = preg_replace('/([\s\S]*)(<em>)([\s\S]*)(</em>)([\s\S]*)/', '$1$5', $text);

- arychj

这大致符合我的要求，但我收到了以下错误警告：Warning: preg_replace() [function.preg-replace]: Unknown modifier '>'。 - lukehillonline

抱歉，我忘记在结束（）组中转义关闭斜杠。 () 应该是：() - arychj

尝试了一下，恐怕它什么也没做。 - lukehillonline

你说它什么也没做？我的输出是：“这是一些示例文本这是一些示例文本这是一些示例文本 这是一些示例文本这是一些示例文本这是一些示例文本” - arychj

你知道 () 里面没有大写字母 V 吗？它是一个反斜杠 ''，然后是一个正斜杠 '/'... - arychj

我尝试了，但什么也没发生。我得到的结果和我输入的完全一样 - 是的，我意识到了，非常感谢！ - lukehillonline