PHP Utf8 解码问题

Question

PHP Utf8 解码问题

phputf-8character-encoding

8

我有以下地址：Praha 5, Staré Město,

在将其写入PDF文件（使用domPDF库）之前，我需要对此字符串使用utf8_decode()函数。

然而，上述地址行的php utf8解码函数似乎不正确（或者说不完整）。

以下是代码：

<?php echo utf8_decode('Praha 5, Staré Město,'); ?>

产生如下结果：

Praha 5，Staré M?sto，

有任何想法为什么ě无法解码？

- Latheesan

utf8_decode 只是将一个以 UTF-8 编码的字符串转换为普通字符串，你的字符串是否已经 utf8_encoded？ - Rajeev Ranjan

4个回答

1

我最终使用了自己编写的UTF-8 / UTF-16解码函数（转换为&#number;表示），我没有发现为什么无法检测到UTF-8的规律，我怀疑这是因为“编码为”序列在返回的字符串中位置不总是完全相同。您可能需要对此进行额外的检查。

三字符UTF-8指示器：$startutf8 = chr(0xEF).chr(187).chr(191);（如果您在任何地方看到它，而不仅仅是前三个字符，则该字符串已经以UTF-8编码）

根据UTF-8规则解码；这取代了一个逐字节执行的早期版本：using

function charset_decode_utf_8 ($string) {
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string))
    return $string;

// decode three byte unicode characters
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",       
"'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",   
$string);

// decode two byte unicode characters
$string = preg_replace("/([\300-\337])([\200-\277])/e",
"'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",
$string);

return $string;
}

- Peters V

1

问题在于你的PHP文件编码，将文件保存为UTF-8编码，即使不使用utf8_decode也可以。如果从数据库中获取这些数据'Praha 5, Staré Město,'，最好将其字符集更改为UTF-8。

- vimal1083

0

你不需要那个 (@Rajeev: 这个字符串会自动被检测为 utf-8 编码)：

echo mb_detect_encoding('Praha 5, Staré Město,');

将始终返回UTF-8。

您更愿意查看： https://code.google.com/p/dompdf/wiki/CPDFUnicode

- scraaappy

我删除了utf8_decode，并设置<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>，并且在配置中将DOMPDF_UNICODE_ENABLED设置为true。但是它不起作用，ě出现为？ - Latheesan

我正在使用“Helvetica”字体，这可能是原因吗？ - Latheesan

你可能需要安装另一个字体。在这里检查答案：https://dev59.com/j3NA5IYBdhLWcg3wX8nk - scraaappy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- deceze · Accepted Answer

utf8_decode函数将字符串从UTF-8编码转换为ISO-8859-1，也就是“Latin-1”格式。

然而，“Latin-1”无法表示字母“ě”，这很简单明了。

“Decode”的命名存在误导，实际上它的功能与iconv('UTF-8', 'ISO-8859-1', $string)相同。

请参考《每位程序员绝对需要知道的有关文本编码和字符集的知识》。