我在命令行中传递了一个经 UTF-8 编码的字符串到 Perl 程序中:
> ./test.pl --string='ḷet ūs try ṭhiñgs'
看起来正确识别了该字符串:
use utf8;
GetOptions(
'string=s' => \$string,
) or die;
print Dumper($string);
print Dumper(utf8::is_utf8($string));
print Dumper(utf8::valid($string));
打印
$VAR1 = 'ḷet ūs try ṭhiñgs';
$VAR1 = '';
$VAR1 = 1;
当我将这个字符串存储到哈希表中并在其上调用encode_json时,该字符串似乎被再次编码,而to_json似乎有效(如果我正确读取输出的话)。
my %a = ( 'nāme' => $string ); # Note the Unicode character
print Dumper(\%a);
print Dumper(encode_json(\%a));
print Dumper(to_json(\%a));
打印
$VAR1 = {
"n\x{101}me" => 'ḷet ūs try ṭhiñgs'
};
$VAR1 = '{"nāme":"ḷet Å«s try á¹hiñgs"}';
$VAR1 = "{\"n\x{101}me\":\"\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs\"}";
然而,用这两种方法将其转换回原始哈希似乎都不起作用,在这两种情况下,哈希和字符串都被破坏了:
print Dumper(decode_json(encode_json(\%a)));
print Dumper(from_json(to_json(\%a)));
打印
$VAR1 = {
"n\x{101}me" => "\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs"
};
$VAR1 = {
"n\x{101}me" => "\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs"
};
哈希查找 $a{'nāme'}
现在失败了。
问题:在Perl中,如何正确处理utf8编码、字符串和JSON的编码/解码?
print Dumper(utf8::is_utf8($string));
返回''
十分明显,该字符串未被识别为 UTF-8。 - jcaronutf8::valid($string)
返回True。 - Jens