JavaScript中将特殊字符转换为HTML

150

如何在JavaScript中将特殊字符转换为HTML?

示例:

  • &(与号)变成 &amp
  • "(双引号)在未设置ENT_NOQUOTES时变成 &quot
  • '(单引号)仅在设置ENT_QUOTES时变成&#039
  • < (小于号)变成 &lt
  • > (大于号)变成 &gt

请查看JavaScript htmlentities http://phpjs.org/functions/htmlentities:425 - kehers
请参见:https://dev59.com/fHM_5IYBdhLWcg3wfTS7 - Chris
你可以使用这个库:https://www.npmjs.com/package/utf8 - Camilo Ortegón
25个回答

226

我认为最好的方法是使用浏览器内置的HTML转义功能来处理许多情况。要做到这一点,只需在DOM树中创建一个元素,并将该元素的innerText设置为您的字符串。然后检索该元素的innerHTML。浏览器将返回一个HTML编码的字符串。

function HtmlEncode(s)
{
  var el = document.createElement("div");
  el.innerText = el.textContent = s;
  s = el.innerHTML;
  return s;
}

测试运行:

alert(HtmlEncode('&;\'><"'));

输出:

&amp;;'&gt;&lt;"

这种转义HTML的方法也被Prototype JS库使用,但与我所提供的简单示例略有不同。

注意:您仍然需要自己转义引号(双引号和单引号)。您可以使用其他人在此处概述的任何方法。


4
请注意,这里使用 delete el 是一个错误。详见 http://perfectionkills.com/understanding-delete/ 。 - gblazex
2
抱歉,我正在测试奇怪的字符,而且Chrome很狡猾,不会显示真正的HTML输出,但是Firebug可以(实际上,当生成的源代码没有对版权符号进行编码时,它显示了一个HTML实体)。这在<>&上确实可以正常工作,但不像Neotropic或KooiInc的解决方案那样全面。 - Moss
24
使用jQuery,将输入文本存储在一个div元素中,并将结果作为HTML格式输出:output = $('<div>').text(input).html() - dragon
8
两种方法都没有将单引号 ' 转换为 ' ,双引号 " 转换为 " ,因此仍然可以用于 XSS 攻击。 - Somebody
1
@BeauCielBleu 如果该字符串始终被视为文本,那么它如何容易受到XSS注入的攻击?浏览器怎么会将其视为HTML呢? - lex82
显示剩余5条评论

91

您需要一个类似于以下功能的函数:

return mystring.replace(/&/g, "&amp;").replace(/>/g, "&gt;").replace(/</g, "&lt;").replace(/"/g, "&quot;");

但是考虑到您对单引号/双引号有不同的处理要求。


1
斜杠 g 是什么意思? - JohnnyBizzle
7
在正则表达式中,/g的意思是“全局”的。简单来说,所有字符串的出现都将被替换。如果没有/g,只有第一次匹配会被替换。 - Kevin G.
更好的答案是 https://dev59.com/SnI-5IYBdhLWcg3wkJQA#4835406 - MMMahdy-PAPION

44

如果您想解码字符串中的整数字符代码,例如&#xxx;,请使用此函数:

function decodeHtmlCharCodes(str) { 
  return str.replace(/(&#(\d+);)/g, function(match, capture, charCode) {
    return String.fromCharCode(charCode);
  });
}

// Will output "The show that gained int’l reputation’!"
console.log(decodeHtmlCharCodes('The show that gained int&#8217;l reputation&#8217;!'));

ES6

const decodeHtmlCharCodes = str => 
  str.replace(/(&#(\d+);)/g, (match, capture, charCode) => 
    String.fromCharCode(charCode));

// Will output "The show that gained int’l reputation’!"
console.log(decodeHtmlCharCodes('The show that gained int&#8217;l reputation&#8217;!'));


4
应该接受这个答案,因为它能解码一切。 - Quesofat
1
请注意,此代码仅解码整数字符代码。它无法解码类似于&或>的内容。 - Magmatic
@Magmatic "For those who want to decode an integer char code like &#xxx; inside a string" 的开头已经很清楚地表明了这些函数是用于解码整数编码的;如果你想要解码命名编码,这里还有很多其他函数可以做到。 - Christos Lytras

35

此通用函数将每个非字母字符编码为其HTML代码(数字字符参考(NCR)):

function HTMLEncode(str) {
    var i = str.length,
        aRet = [];

    while (i--) {
        var iC = str[i].charCodeAt();
        if (iC < 65 || iC > 127 || (iC>90 && iC<97)) {
            aRet[i] = '&#'+iC+';';
        } else {
            aRet[i] = str[i];
        }
    }
    return aRet.join('');
}

[编辑于2022年] 更现代的方法:

const toHtmlEntities = (str, showInHtml = false) => 
  [...str].map( v => `${showInHtml ? `&amp;#` : `&#`}${v.charCodeAt(0)};`).join(``);
const str = `&Hellõ Wórld`;

document.body.insertAdjacentHTML(`beforeend`, `<ul>
  <li>Show the entities (<code>toHtmlEntities(str, true)</code>): <b>${
    toHtmlEntities(str, true)}</b></li>
  <li>Let the browser decide (<code>toHtmlEntities(str)</code>): <b>${
    toHtmlEntities(str)}</b></li>
  <li id="textOnly"></li></ul>`);
document.querySelector(`#textOnly`).textContent = `As textContent: ${
  toHtmlEntities(str)}`;
body {
  font: 14px / 18px "normal verdana", arial;
  margin: 1rem;
}

code {
  background-color: #eee;
}


1
这听起来非常聪明,但我只能让它转换基本字符:<>& - Moss
nvm。它在控制台中运行得很好,但是当你输出到浏览器时,它看起来好像没有转换东西。这是怎么回事? - Moss
@Moss:浏览器将html-encoded字符渲染为它们所表示的字符。 html-encoded字符的优点在于,浏览器无需猜测(例如)变音符号字符的翻译,并始终呈现这些字符应该呈现的样子。 - KooiInc
你可以考虑更改这个代码,以便移除对字符串的类数组访问。IE7及以下版本不支持该操作,而您可以直接使用i作为参数从str中调用charCodeAt方法。var iC = str.charCodeAt(i) - Chase
这段代码没有正确生成±字符的HTML实体值,应该是±,但它返回的是一个未知字符�。 - Paul
显示剩余2条评论

27
创建一个使用字符串 replace 的函数。
function convert(str)
{
  str = str.replace(/&/g, "&amp;");
  str = str.replace(/>/g, "&gt;");
  str = str.replace(/</g, "&lt;");
  str = str.replace(/"/g, "&quot;");
  str = str.replace(/'/g, "&#039;");
  return str;
}

我在处理将单引号(')和双引号(")一起用于我的输入数值以在HTML中显示的问题。如果用户使用它,脚本就会中断。 - Dharam Mali

23

从Mozilla开始...

请注意,charCodeAt始终会返回小于65,536的值。这是因为更高的码点由一对(较低值的)“替代”伪字符表示,用于组成实际字符。因此,为了检查或复制值为65,536及以上的个别字符的完整字符,对于这样的字符,需要检索charCodeAt(i)和charCodeAt(i + 1)(就像检查/复制具有两个字母的字符串一样)。

最佳解决方案

/**
 * (c) 2012 Steven Levithan <http://slevithan.com/>
 * MIT license
 */
if (!String.prototype.codePointAt) {
    String.prototype.codePointAt = function (pos) {
        pos = isNaN(pos) ? 0 : pos;
        var str = String(this),
            code = str.charCodeAt(pos),
            next = str.charCodeAt(pos + 1);
        // If a surrogate pair
        if (0xD800 <= code && code <= 0xDBFF && 0xDC00 <= next && next <= 0xDFFF) {
            return ((code - 0xD800) * 0x400) + (next - 0xDC00) + 0x10000;
        }
        return code;
    };
}

/**
 * Encodes special html characters
 * @param string
 * @return {*}
 */
function html_encode(string) {
    var ret_val = '';
    for (var i = 0; i < string.length; i++) { 
        if (string.codePointAt(i) > 127) {
            ret_val += '&#' + string.codePointAt(i) + ';';
        } else {
            ret_val += string.charAt(i);
        }
    }
    return ret_val;
}

使用示例:

html_encode("✈");

12

dragon提到的,最干净的方法是使用jQuery

function htmlEncode(s) {
    return $('<div>').text(s).html();
}

function htmlDecode(s) {
    return $('<div>').html(s).text();
}

有趣的是,如果你的字符串包含空格,这种方法不会改变它。更好的方法是使用encodeURI(yourString)。 - Mike Gledhill
一个空格不是特殊字符。encodeURI用于编码URL而不是HTML...它不是正确的工具。 - Serj Sagan

9
function char_convert() {

    var chars = ["©","Û","®","ž","Ü","Ÿ","Ý","$","Þ","%","¡","ß","¢","à","£","á","À","¤","â","Á","¥","ã","Â","¦","ä","Ã","§","å","Ä","¨","æ","Å","©","ç","Æ","ª","è","Ç","«","é","È","¬","ê","É","­","ë","Ê","®","ì","Ë","¯","í","Ì","°","î","Í","±","ï","Î","²","ð","Ï","³","ñ","Ð","´","ò","Ñ","µ","ó","Õ","¶","ô","Ö","·","õ","Ø","¸","ö","Ù","¹","÷","Ú","º","ø","Û","»","ù","Ü","@","¼","ú","Ý","½","û","Þ","€","¾","ü","ß","¿","ý","à","‚","À","þ","á","ƒ","Á","ÿ","å","„","Â","æ","…","Ã","ç","†","Ä","è","‡","Å","é","ˆ","Æ","ê","‰","Ç","ë","Š","È","ì","‹","É","í","Œ","Ê","î","Ë","ï","Ž","Ì","ð","Í","ñ","Î","ò","‘","Ï","ó","’","Ð","ô","“","Ñ","õ","”","Ò","ö","•","Ó","ø","–","Ô","ù","—","Õ","ú","˜","Ö","û","™","×","ý","š","Ø","þ","›","Ù","ÿ","œ","Ú"]; 
    var codes = ["&copy;","&#219;","&reg;","&#158;","&#220;","&#159;","&#221;","&#36;","&#222;","&#37;","&#161;","&#223;","&#162;","&#224;","&#163;","&#225;","&Agrave;","&#164;","&#226;","&Aacute;","&#165;","&#227;","&Acirc;","&#166;","&#228;","&Atilde;","&#167;","&#229;","&Auml;","&#168;","&#230;","&Aring;","&#169;","&#231;","&AElig;","&#170;","&#232;","&Ccedil;","&#171;","&#233;","&Egrave;","&#172;","&#234;","&Eacute;","&#173;","&#235;","&Ecirc;","&#174;","&#236;","&Euml;","&#175;","&#237;","&Igrave;","&#176;","&#238;","&Iacute;","&#177;","&#239;","&Icirc;","&#178;","&#240;","&Iuml;","&#179;","&#241;","&ETH;","&#180;","&#242;","&Ntilde;","&#181;","&#243;","&Otilde;","&#182;","&#244;","&Ouml;","&#183;","&#245;","&Oslash;","&#184;","&#246;","&Ugrave;","&#185;","&#247;","&Uacute;","&#186;","&#248;","&Ucirc;","&#187;","&#249;","&Uuml;","&#64;","&#188;","&#250;","&Yacute;","&#189;","&#251;","&THORN;","&#128;","&#190;","&#252","&szlig;","&#191;","&#253;","&agrave;","&#130;","&#192;","&#254;","&aacute;","&#131;","&#193;","&#255;","&aring;","&#132;","&#194;","&aelig;","&#133;","&#195;","&ccedil;","&#134;","&#196;","&egrave;","&#135;","&#197;","&eacute;","&#136;","&#198;","&ecirc;","&#137;","&#199;","&euml;","&#138;","&#200;","&igrave;","&#139;","&#201;","&iacute;","&#140;","&#202;","&icirc;","&#203;","&iuml;","&#142;","&#204;","&eth;","&#205;","&ntilde;","&#206;","&ograve;","&#145;","&#207;","&oacute;","&#146;","&#208;","&ocirc;","&#147;","&#209;","&otilde;","&#148;","&#210;","&ouml;","&#149;","&#211;","&oslash;","&#150;","&#212;","&ugrave;","&#151;","&#213;","&uacute;","&#152;","&#214;","&ucirc;","&#153;","&#215;","&yacute;","&#154;","&#216;","&thorn;","&#155;","&#217;","&yuml;","&#156;","&#218;"];

    for(x=0; x<chars.length; x++){
        for (i=0; i<arguments.length; i++){
            arguments[i].value = arguments[i].value.replace(chars[x], codes[x]);
        }
    }
 }

char_convert(this);

1
这很好用。但是出于某种原因,当与一些JQuery功能混合使用时,它会出现问题。有时候无法转换一些字符,或者只能转换几个字符。但总的来说,还是很好用的。onBlur="char_convert(this);" - Neotropic
哎呀,在Chrome中我遇到了一个错误:“Uncaught TypeError: Cannot call method 'replace' of undefined”,在Firebug中则是“arguments[i].value is undefined”。 - Moss
将所有这些特殊字符放入一个数组中是完全没有意义的。请参见其他答案。 - Gavin
对我来说最好的解决方案,唯一一个可以将 í 转换为 í 的例子。 - Edhowler
你如何从键盘输入这些字符?我知道这是个愚蠢的问题,但是在OS X中,例如... - PositiveGuy

6
function ConvChar(str) {
    c = {'&lt;':'&amp;lt;', '&gt;':'&amp;gt;', '&':'&amp;amp;',
         '"':'&amp;quot;', "'":'&amp;#039;', '#':'&amp;#035;' };

    return str.replace(/[&lt;&amp;>'"#]/g, function(s) { return c[s]; });
}

alert(ConvChar('&lt;-"-&-"->-&lt;-\'-#-\'->'));

结果:

<-"-&ampamp-"--><-'-#-'->

在testarea标签中:

<-"-&-"->-<'-#-'->

如果您只需要更改长代码中的几个字符...


6
如果您需要支持所有标准命名实体引用Unicode不明确的&符号,我知道的唯一100%可靠的解决方案是he库!

示例用法

he.encode('foo © bar ≠ baz  qux');
// Output: 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

he.decode('foo &copy; bar &ne; baz &#x1D306; qux');
// Output: 'foo © bar ≠ baz  qux'

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接