PHP-从字符串中删除解码的HTML实体

Question

PHP-从字符串中删除解码的HTML实体

phpstringreplacehtml-entities

5

我正在尝试对字符串进行清理，最终得到以下内容：

从成熟的粉煤灰酸性矿水中分离出的 lt i gt Bacillus lt i gt sp UWC 中的砷抗性基因的特征描述

我需要去除lt、i、gt，因为它们是HTML实体缩写，不能被移除。有什么最好的方法或其他解决方案可以考虑吗？

以下是我的当前解决方案：

/**
 * @return string
 */
public function getFormattedTitle()
{
    $string = preg_replace('/[^A-Za-z0-9\-]/', ' ',  filter_var($this->getTitle(), FILTER_SANITIZE_STRING));
    return $string;
}

这里是一个输入字符串的示例：

Assessing <i>Clivia</i> taxonomy using the core DNA barcode regions, <i>matK</i> and <i>rbcLa</i>

谢谢！

- liamjnorman

我的猜测是，您已经尝试过使用http://php.net/manual/en/function.str-replace.php和http://php.net/manual/en/function.strip-tags.php了吗？ - Adam

4个回答

3

不要使用filter_var，尝试使用strip_tags: http://php.net/manual/en/function.strip-tags.php

<?php
  //your input string
  $input_string = 'Assessing <i>Clivia</i> taxonomy using the core DNA barcode regions, <i>matK</i> and <i>rbcLa</i>';

  //strip away all html tags but leave whats inside
  $output_string = strip_tags($input_string);

  echo $output_string;
  //echos: Assessing Clivia taxonomy using the core DNA barcode regions, matK and rbcLa 

?>

- The Dog

我尝试过，但最终得到的是： “从Mmabatho地区不同水源中分离出的<i>大肠杆菌</i>的抗生素耐药谱” 现在我需要删除&lt, &gt, &i等字符... - liamjnorman

在这种情况下，我认为@Thomas David Baker是正确的，你看到的是屏幕上的HTML标签，但是你的底层数据填充了HTML实体。当你展示输入字符串的示例时，你是从浏览器复制的还是来自文本文件？ - The Dog

0

很好，不过如果没有清除UTF-8图标字符，那只是一个很好的开始。我已经添加了。

preg_replace('/[^(\x20-\x7F)]*/','', $s);

- Seb

-1

更好的方法是 strip_tags(); 这里有一个手册： http://php.net/manual/ru/function.strip-tags.php 一个例子：

   public function getFormattedTitle()
    {
        return strip_tags($this->getTitle(), '<i>');
    }

- Nikolai

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Thomas David Baker · Accepted Answer

你输出的字符串中的lt和gt告诉我它实际上更像：

"使用核心DNA条形码区域，matK和rbcLa评估<i>Clivia</i>分类"

当作为纯文本查看时，以上是浏览器会解释成"<"和">"的内容（这些通常称为"HTML实体"，提供了一种编码HTML中会被解释的字符的方法）。

一个选择是进行如下处理：

$s = "Assessing &lt;i&gt;Clivia&lt;/i&gt; taxonomy …";
$s = html_entity_decode($s); // $s is now "Assessing <i>Clivia</i> taxonomy …"
$s = strip_tags($s); // $s is now "Assessing Clivia taxonomy"

但是请注意，strip_tags函数非常幼稚。例如，它会将'1<5 and 6>2'转换为'12'！因此，您需要确保所有输入文本都进行了双HTML编码，才能使其完美地工作。