我遇到了字符编码的困难。我正在尝试抓取以下网址:
http://www.google.com/movies?near=Montreal&date=0
我的代码看起来像这样:
var http = require('http');
var url = require('url');
var Iconv = require('iconv').Iconv;
var location = 'montreal';
var googleMovies = url.parse("http://www.google.com/movies?near=" + location);
var req = http.request(googleMovies, function(response) {
var str = '';
response.on('data', function(chunk) {
str += chunk;
});
response.on('end', function() {
var iconv = new Iconv('latin1', 'UTF-8');
str = iconv.convert(str).toString();
console.log(str);
});
});
req.end()
我最初尝试时没有使用:
var iconv = new Iconv('latin1', 'UTF-8');
str = iconv.convert(str).toString();
但这会导致出现�字符。
我已在此页面上测试了上述源代码:
http://nlp.fi.muni.cz/projects/chared/
它似乎将其检测为Latin1编码,但可能存在问题。