我有一些荷兰语字符串。我知道如何使用PHP对它们进行编码。
$str = iconv( "Windows-1252", "UTF-8", $str );
在JavaScript中,等价的代码是什么?
我有一些荷兰语字符串。我知道如何使用PHP对它们进行编码。
$str = iconv( "Windows-1252", "UTF-8", $str );
Windows-1252 是一种单字节编码,非常方便:你可以直接构建一个查找表。
<?php
$s = '';
for ($i = 0; $i < 256; $i++) {
$converted = iconv('Windows-1252', 'UTF-8', chr($i));
if ($converted === false) {
$s .= "\xef\xbf\xbd"; # UTF-8 replacement character
} else {
$s .= $converted;
}
}
echo $s;
var WINDOWS_1252 = '\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~€�‚ƒ„…†‡ˆ‰Š‹Œ�Ž��‘’“”•–—˜™š›œ�žŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ';
function fromWindows1252(binaryString) {
var text = '';
for (var i = 0; i < binaryString.length; i++) {
text += WINDOWS_1252.charAt(binaryString.charCodeAt(i));
}
return text;
}
iconv
将CSV文件编码为UTF-8呢?这样你只需要在服务器和客户端之间进行一次往返。 - Ry-接受的答案对我没用,让我展示一下什么起作用了。
我的误解字符串来自一个可读流。当我运行await ReadableStreamVar.text()
时,所得到的文本将所有变音符号都替换为奇怪的“�”字符。当我尝试手动在浏览器中打开同一页时,它显示正确,并且在控制台中键入document.characterSet
后,我看到字符集是“windows-1252”,而不是UTF-8。
Ry的答案没有起作用,因为ReadableStream中的text
方法已经使用错误的编码,并用相同的�字符替换了所有不同的变音符号。
查阅ReadableStream API,我没有找到任何传递所需字符集到.text
方法的形式,因此我尝试了不同的方法:将ReadableStream作为数组(ReadableStream.arrayBuffer()
)读取,这样就行了。在数组中,我可以看到带有变音符号的字符具有正确字符的字符编码。因此,根据这个答案,我使用了TextDecoder API,现在我的可工作代码如下:
async function getDocumentTextContent(uri) {
const response = await fetch(uri, { method: "GET" })
if (!response.ok) throw new Error(`Problem retrieving the resource. Error message: ${response.statusText}`)
const dec = new TextDecoder("windows-1252") //Here I can inform the desired charset
const arrBuffer = await response.arrayBuffer()
const ui8array = new Uint8Array(arrBuffer)
const text = dec.decode(ui8array)
console.log(text)
return text
}
我用了 brute force 方法实现,可能不是最优雅的方案,但是它能够工作:
function bruteForceWindows1252toUTF16(s) {
for (var i = 0; i < globalWin1252toUTF16table.length; i++) {
if (s.includes(globalWin1252toUTF16table[i]['win1252'])) {
s = s.replaceAll(globalWin1252toUTF16table[i]['win1252'], globalWin1252toUTF16table[i]['utf16']);
}
}
return s;
}
const globalWin1252toUTF16table = [
{ win1252:'\xe2\x82\xac', utf16:'\u20AC' },
{ win1252:'\xe2\x80\x9a', utf16:'\u201A' },
{ win1252:'\xc6\x92', utf16:'\u0192' },
{ win1252:'\xe2\x80\x9e', utf16:'\u201E' },
{ win1252:'\xe2\x80\xa6', utf16:'\u2026' },
{ win1252:'\xe2\x80\xa0', utf16:'\u2020' },
{ win1252:'\xe2\x80\xa1', utf16:'\u2021' },
{ win1252:'\xcb\x86', utf16:'\u02C6' },
{ win1252:'\xe2\x80\xb0', utf16:'\u2030' },
{ win1252:'\xc5\xa0', utf16:'\u0160' },
{ win1252:'\xe2\x80\xb9', utf16:'\u2039' },
{ win1252:'\xc5\x92', utf16:'\u0152' },
{ win1252:'\xc5\xbd', utf16:'\u017D' },
{ win1252:'\xe2\x80\x98', utf16:'\u2018' },
{ win1252:'\xe2\x80\x99', utf16:'\u2019' },
{ win1252:'\xe2\x80\x9c', utf16:'\u201C' },
{ win1252:'\xe2\x80\x9d', utf16:'\u201D' },
{ win1252:'\xe2\x80\xa2', utf16:'\u2022' },
{ win1252:'\xe2\x80\x93', utf16:'\u2013' },
{ win1252:'\xe2\x80\x94', utf16:'\u2014' },
{ win1252:'\xcb\x9c', utf16:'\u02DC' },
{ win1252:'\xe2\x84\xa2', utf16:'\u2122' },
{ win1252:'\xc5\xa1', utf16:'\u0161' },
{ win1252:'\xe2\x80\xba', utf16:'\u203A' },
{ win1252:'\xc5\x93', utf16:'\u0153' },
{ win1252:'\xc5\xbe', utf16:'\u017E' },
{ win1252:'\xc5\xb8', utf16:'\u0178' },
{ win1252:'\xc2\xa0', utf16:'\u00A0' },
{ win1252:'\xc2\xa1', utf16:'\u00A1' },
{ win1252:'\xc2\xa2', utf16:'\u00A2' },
{ win1252:'\xc2\xa3', utf16:'\u00A3' },
{ win1252:'\xc2\xa4', utf16:'\u00A4' },
{ win1252:'\xc2\xa5', utf16:'\u00A5' },
{ win1252:'\xc2\xa6', utf16:'\u00A6' },
{ win1252:'\xc2\xa7', utf16:'\u00A7' },
{ win1252:'\xc2\xa8', utf16:'\u00A8' },
{ win1252:'\xc2\xa9', utf16:'\u00A9' },
{ win1252:'\xc2\xaa', utf16:'\u00AA' },
{ win1252:'\xc2\xab', utf16:'\u00AB' },
{ win1252:'\xc2\xac', utf16:'\u00AC' },
{ win1252:'\xc2\xad', utf16:'\u00AD' },
{ win1252:'\xc2\xae', utf16:'\u00AE' },
{ win1252:'\xc2\xaf', utf16:'\u00AF' },
{ win1252:'\xc2\xb0', utf16:'\u00B0' },
{ win1252:'\xc2\xb1', utf16:'\u00B1' },
{ win1252:'\xc2\xb2', utf16:'\u00B2' },
{ win1252:'\xc2\xb3', utf16:'\u00B3' },
{ win1252:'\xc2\xb4', utf16:'\u00B4' },
{ win1252:'\xc2\xb5', utf16:'\u00B5' },
{ win1252:'\xc2\xb6', utf16:'\u00B6' },
{ win1252:'\xc2\xb7', utf16:'\u00B7' },
{ win1252:'\xc2\xb8', utf16:'\u00B8' },
{ win1252:'\xc2\xb9', utf16:'\u00B9' },
{ win1252:'\xc2\xba', utf16:'\u00BA' },
{ win1252:'\xc2\xbb', utf16:'\u00BB' },
{ win1252:'\xc2\xbc', utf16:'\u00BC' },
{ win1252:'\xc2\xbd', utf16:'\u00BD' },
{ win1252:'\xc2\xbe', utf16:'\u00BE' },
{ win1252:'\xc2\xbf', utf16:'\u00BF' },
{ win1252:'\xc3\x80', utf16:'\u00C0' },
{ win1252:'\xc3\x81', utf16:'\u00C1' },
{ win1252:'\xc3\x82', utf16:'\u00C2' },
{ win1252:'\xc3\x83', utf16:'\u00C3' },
{ win1252:'\xc3\x84', utf16:'\u00C4' },
{ win1252:'\xc3\x85', utf16:'\u00C5' },
{ win1252:'\xc3\x86', utf16:'\u00C6' },
{ win1252:'\xc3\x87', utf16:'\u00C7' },
{ win1252:'\xc3\x88', utf16:'\u00C8' },
{ win1252:'\xc3\x89', utf16:'\u00C9' },
{ win1252:'\xc3\x8a', utf16:'\u00CA' },
{ win1252:'\xc3\x8b', utf16:'\u00CB' },
{ win1252:'\xc3\x8c', utf16:'\u00CC' },
{ win1252:'\xc3\x8d', utf16:'\u00CD' },
{ win1252:'\xc3\x8e', utf16:'\u00CE' },
{ win1252:'\xc3\x8f', utf16:'\u00CF' },
{ win1252:'\xc3\x90', utf16:'\u00D0' },
{ win1252:'\xc3\x91', utf16:'\u00D1' },
{ win1252:'\xc3\x92', utf16:'\u00D2' },
{ win1252:'\xc3\x93', utf16:'\u00D3' },
{ win1252:'\xc3\x94', utf16:'\u00D4' },
{ win1252:'\xc3\x95', utf16:'\u00D5' },
{ win1252:'\xc3\x96', utf16:'\u00D6' },
{ win1252:'\xc3\x97', utf16:'\u00D7' },
{ win1252:'\xc3\x98', utf16:'\u00D8' },
{ win1252:'\xc3\x99', utf16:'\u00D9' },
{ win1252:'\xc3\x9a', utf16:'\u00DA' },
{ win1252:'\xc3\x9b', utf16:'\u00DB' },
{ win1252:'\xc3\x9c', utf16:'\u00DC' },
{ win1252:'\xc3\x9d', utf16:'\u00DD' },
{ win1252:'\xc3\x9e', utf16:'\u00DE' },
{ win1252:'\xc3\x9f', utf16:'\u00DF' },
{ win1252:'\xc3\xa0', utf16:'\u00E0' },
{ win1252:'\xc3\xa1', utf16:'\u00E1' },
{ win1252:'\xc3\xa2', utf16:'\u00E2' },
{ win1252:'\xc3\xa3', utf16:'\u00E3' },
{ win1252:'\xc3\xa4', utf16:'\u00E4' },
{ win1252:'\xc3\xa5', utf16:'\u00E5' },
{ win1252:'\xc3\xa6', utf16:'\u00E6' },
{ win1252:'\xc3\xa7', utf16:'\u00E7' },
{ win1252:'\xc3\xa8', utf16:'\u00E8' },
{ win1252:'\xc3\xa9', utf16:'\u00E9' },
{ win1252:'\xc3\xaa', utf16:'\u00EA' },
{ win1252:'\xc3\xab', utf16:'\u00EB' },
{ win1252:'\xc3\xac', utf16:'\u00EC' },
{ win1252:'\xc3\xad', utf16:'\u00ED' },
{ win1252:'\xc3\xae', utf16:'\u00EE' },
{ win1252:'\xc3\xaf', utf16:'\u00EF' },
{ win1252:'\xc3\xb0', utf16:'\u00F0' },
{ win1252:'\xc3\xb1', utf16:'\u00F1' },
{ win1252:'\xc3\xb2', utf16:'\u00F2' },
{ win1252:'\xc3\xb3', utf16:'\u00F3' },
{ win1252:'\xc3\xb4', utf16:'\u00F4' },
{ win1252:'\xc3\xb5', utf16:'\u00F5' },
{ win1252:'\xc3\xb6', utf16:'\u00F6' },
{ win1252:'\xc3\xb7', utf16:'\u00F7' },
{ win1252:'\xc3\xb8', utf16:'\u00F8' },
{ win1252:'\xc3\xb9', utf16:'\u00F9' },
{ win1252:'\xc3\xba', utf16:'\u00FA' },
{ win1252:'\xc3\xbb', utf16:'\u00FB' },
{ win1252:'\xc3\xbc', utf16:'\u00FC' },
{ win1252:'\xc3\xbd', utf16:'\u00FD' },
{ win1252:'\xc3\xbe', utf16:'\u00FE' },
{ win1252:'\xc3\xbf', utf16:'\u00FF' }
];