使用XMLHttpRequest获取非UTF-8数据

Question

使用XMLHttpRequest获取非UTF-8数据

5

我想使用xmlHttpRequest从网络获取文档。然而，所需的文本不是utf8编码（在这种情况下是windows-1251，但在一般情况下，我无法确定）。如果我使用responseType="text"，它会将其视为utf8字符串，并忽略内容类型中的字符集（导致混乱的结果）。如果我使用'blob'（可能是我想要的最接近的东西），那么我能否将其转换为DomString并考虑到编码？

- Tom Tanner

1

忽略content-type中的字符集，你确定服务器能够正确读取文件的windows-1251编码并以此方式提供服务，并且响应中包含正确的content-type吗？如果这三点中的任何一点出现问题，那么在浏览器接收到第一个字节之前，你可能就会得到乱码。 - Thomas

将其转换为DomString，考虑编码方式。我不知道是否有适用于此的API /库，但最坏的情况下，您可以将每个字节映射到正确的字符。 - Thomas

2个回答

4

如果我使用'blob'（可能是我想要的最接近的东西），那么我是否可以考虑编码，将其转换为DomString？这里有一个通用方法https://medium.com/programmers-developers/convert-blob-to-string-in-javascript-944c15ad7d52，适用于获取远程文档的情况：

创建一个 FileReader 以读取作为 Blob 的获取响应
使用 FileReader.readAsText() 从该 Blob 中获取正确编码的文本

像这样：

const reader = new FileReader()
reader.addEventListener("loadend", function() {
  console.log(reader.result)
})
fetch("https://people.w3.org/mike/tests/windows-1251/test.txt")
  .then(response => response.blob())
  .then(blob => reader.readAsText(blob, "windows-1251"))

或者，如果你真的想使用XHR：

const reader = new FileReader()
reader.addEventListener("loadend", function() {
  console.log(reader.result)
})
const xhr = new XMLHttpRequest()
xhr.responseType = "blob"
xhr.onload = function() {
  reader.readAsText(xhr.response, "windows-1251")
}
xhr.open("GET", "https://people.w3.org/mike/tests/windows-1251/test.txt", true)
xhr.send(null)

然而，如果我使用responseType="text"，它会将其视为utf8字符串，忽略内容类型中的字符集。

是的。这就是Fetch规范所要求的（XHR规范也依赖于此）：

实现Body mixin的对象还具有相关的数据包算法，给定字节、类型和MIME类型，切换到类型并运行相关步骤：
…
↪ 文本
返回在bytes上运行UTF-8解码的结果。

- sideshowbarker

我在获取规范中错过了那个注释。谢谢。使用XMLHttpRequest的原因是找出编码方式是什么。 - Tom Tanner

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tom Tanner · Accepted Answer

我找到了一个API可以满足您的需求，来自这里：https://developers.google.com/web/updates/2014/08/Easier-ArrayBuffer-String-conversion-with-the-Encoding-API。基本上，使用responseType="arraybuffer"，从返回的headers中选择编码，并使用DataView和TextDecoder即可。它正好符合要求。

const xhr = new XMLHttpRequest();
xhr.responseType = "arraybuffer";
xhr.onload = function() {
  const contenttype = xhr.getResponseHeader("content-type");
  const charset = contenttype.substring(contenttype.indexOf("charset=") + 8);
  const dataView = new DataView(xhr.response);
  const decoder = new TextDecoder(charset);
  console.log(decoder.decode(dataView));
}
xhr.open("GET", "https://people.w3.org/mike/tests/windows-1251/test.txt");
xhr.send(null);

fetch("https://people.w3.org/mike/tests/windows-1251/test.txt")
  .then(response => {
    const contenttype = response.headers.get("content-type");
    const charset = contenttype.substring(contenttype.indexOf("charset=") + 8);
    response.arrayBuffer()
      .then(ab => {
        const dataView = new DataView(ab);
        const decoder = new TextDecoder(charset);
        console.log(decoder.decode(dataView));
      })
  })