FileReader - 支持哪些编码?

21

想要将用户输入的文件简单地作为文本读取。

可以依赖现代浏览器的使用,因此我使用 FileReader 来实现这一点(效果很好)。

reader.readAsText(myfile, encoding);

我知道encoding默认为UTF-8。

但因为我的用户将上传来自各种来源(Windows,Mac,Linux)和各种浏览器的文件,所以我要求用户通过选择框提供编码。

例如,对于西欧Windows文本文件,我希望用户选择例如windows-1252。

我找不到 FileReader 支持的编码列表(假设这至少取决于浏览器)。

我不打算自动确定编码,我只是想以以下方式填充我的选择框:

<select id="encoding">
   <option value="windows-1252">Windows (Western Latin)</option>
   <option value="utf-8">UTF-8</option>
   <option value="...">...</option>
</select>

我的问题是:

  1. 我从哪里获取支持的编码列表来填充选项值?
  2. 如何确定这些值的确切书写方式(是 'utf8' 还是 'UTF-8',或者其他),这些取决于操作系统 / 浏览器吗?
  3. 如果不支持编码,则是否会因 readAsText(myfile, unsupportedEncoding) 抛出错误,我可以捕获该错误?

我希望不使用任何主要第三方库。

奖励问题:

是否有一种简单的方法来获取值的有意义的翻译,例如 cp10029 表示 Mac (中欧)


2
通过一次简单的谷歌搜索,没有发现太多相关信息。或许这个链接有帮助?http://stackoverflow.com/questions/37884928/cant-fit-file-encoding-when-working-with-chrome-file-system-api/37885580 - Dan Wilson
您IP地址为143.198.54.68,由于运营成本限制,当前对于免费用户的使用频率限制为每个IP每72小时10次对话,如需解除限制,请点击左下角设置图标按钮(手机用户先点击左上角菜单按钮)。 - LBA
1
支持的代码页可以在这里找到。我建议再仔细看一下丹提供的链接,因为这是一个不错的方法。这种方法还可以检测BOM并具有允许提前猜测编码的功能。 - user1693593
2个回答

10
编码标准 - https://github.com/whatwg/encoding/(以JSON格式 - https://github.com/whatwg/encoding/blob/master/encodings.json。使用“标签”字段中的值)

enter image description here

  1. Encoding parameter is not case sensitive.

  2. NO, readAsText(myfile, unsupportedEncoding) not throw any error. The function simply uses the default encoding("utf8").

    window.onload = function() {
    
        //Check File API support
        if (window.File && window.FileList && window.FileReader) {
            var filesInput = document.getElementById("files");
    
            filesInput.addEventListener("change", function(event) {
    
                var files = event.target.files; //FileList object
                var output = document.getElementById("result");
    
                for (var i = 0; i < files.length; i++) {
                    var file = files[i];
    
                    //Only plain text
                    if (!file.type.match('plain')) continue;
    
                    var picReader = new FileReader();
    
                    picReader.addEventListener("load", function(event) {
    
                        var textFile = event.target;
    
                        var div = document.createElement("div");
    
                        div.innerText = textFile.result;
    
                        output.insertBefore(div, null);
    
                    });
                    //Read the text file
                    picReader.readAsText(file, "cP1251");
                }
    
            });
        }
        else {
            console.log("Your browser does not support File API");
        }
    }
    

演示

要获取值的翻译,您可以使用JSON文件(https://github.com/whatwg/encoding/blob/master/encodings.json),参数"heading"和"name"。


我有点担心的是,WHATWG 似乎是唯一一个试图跟踪显然唯一存在的标准的团体,但你的回答正确地回应了我的所有问题,所以我会接受它。一旦可能有更好/“官方”的回应,我可能会改变这一点,希望听起来合理。 - LBA

1
名称和标签 下表列出了所有编码及其标签,用户代理必须支持。用户代理不得支持任何其他编码或标签。
# UTF-8
"unicode-1-1-utf-8"
"unicode11utf8"
"unicode20utf8"
"utf-8"
"utf8"
"x-unicode20utf8"

# IBM866
"866"
"cp866"
"csibm866"
"ibm866"

# ISO-8859-2
"csisolatin2"
"iso-8859-2"
"iso-ir-101"
"iso8859-2"
"iso88592"
"iso_8859-2"
"iso_8859-2:1987"
"l2"
"latin2"

# ISO-8859-3
"csisolatin3"
"iso-8859-3"
"iso-ir-109"
"iso8859-3"
"iso88593"
"iso_8859-3"
"iso_8859-3:1988"
"l3"
"latin3"

# ISO-8859-4
"csisolatin4"
"iso-8859-4"
"iso-ir-110"
"iso8859-4"
"iso88594"
"iso_8859-4"
"iso_8859-4:1988"
"l4"
"latin4"

# ISO-8859-5
"csisolatincyrillic"
"cyrillic"
"iso-8859-5"
"iso-ir-144"
"iso8859-5"
"iso88595"
"iso_8859-5"
"iso_8859-5:1988"

# ISO-8859-6
"arabic"
"asmo-708"
"csiso88596e"
"csiso88596i"
"csisolatinarabic"
"ecma-114"
"iso-8859-6"
"iso-8859-6-e"
"iso-8859-6-i"
"iso-ir-127"
"iso8859-6"
"iso88596"
"iso_8859-6"
"iso_8859-6:1987"

# ISO-8859-7
"csisolatingreek"
"ecma-118"
"elot_928"
"greek"
"greek8"
"iso-8859-7"
"iso-ir-126"
"iso8859-7"
"iso88597"
"iso_8859-7"
"iso_8859-7:1987"
"sun_eu_greek"

# ISO-8859-8
"csiso88598e"
"csisolatinhebrew"
"hebrew"
"iso-8859-8"
"iso-8859-8-e"
"iso-ir-138"
"iso8859-8"
"iso88598"
"iso_8859-8"
"iso_8859-8:1988"
"visual"

# ISO-8859-8-I
"csiso88598i"
"iso-8859-8-i"
"logical"

# ISO-8859-10
"csisolatin6"
"iso-8859-10"
"iso-ir-157"
"iso8859-10"
"iso885910"
"l6"
"latin6"

# ISO-8859-13
"iso-8859-13"
"iso8859-13"
"iso885913"

# ISO-8859-14
"iso-8859-14"
"iso8859-14"
"iso885914"

# ISO-8859-15
"csisolatin9"
"iso-8859-15"
"iso8859-15"
"iso885915"
"iso_8859-15"
"l9"

# ISO-8859-16
"iso-8859-16"

# KOI8-R
"cskoi8r"
"koi"
"koi8"
"koi8-r"
"koi8_r"

# KOI8-U
"koi8-ru"
"koi8-u"

# macintosh
"csmacintosh"
"mac"
"macintosh"
"x-mac-roman"

# windows-874
"dos-874"
"iso-8859-11"
"iso8859-11"
"iso885911"
"tis-620"
"windows-874"

# windows-1250
"cp1250"
"windows-1250"
"x-cp1250"

# windows-1251
"cp1251"
"windows-1251"
"x-cp1251"

# windows-1252
"ansi_x3.4-1968"
"ascii"
"cp1252"
"cp819"
"csisolatin1"
"ibm819"
"iso-8859-1"
"iso-ir-100"
"iso8859-1"
"iso88591"
"iso_8859-1"
"iso_8859-1:1987"
"l1"
"latin1"
"us-ascii"
"windows-1252"
"x-cp1252"

# windows-1253
"cp1253"
"windows-1253"
"x-cp1253"

# windows-1254
"cp1254"
"csisolatin5"
"iso-8859-9"
"iso-ir-148"
"iso8859-9"
"iso88599"
"iso_8859-9"
"iso_8859-9:1989"
"l5"
"latin5"
"windows-1254"
"x-cp1254"

# windows-1255
"cp1255"
"windows-1255"
"x-cp1255"

# windows-1256
"cp1256"
"windows-1256"
"x-cp1256"

# windows-1257
"cp1257"
"windows-1257"
"x-cp1257"

# windows-1258
"cp1258"
"windows-1258"
"x-cp1258"

# x-mac-cyrillic
"x-mac-cyrillic"
"x-mac-ukrainian"

更多编码请参见这里:https://encoding.spec.whatwg.org/#names-and-labels

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接