Python / Mako：如何正确解析Unicode字符串/字符？

Question

Python / Mako：如何正确解析Unicode字符串/字符？

3

我正在尝试使用Mako渲染一些包含Unicode字符的字符串：

tempLook=TemplateLookup(..., default_filters=[], input_encoding='utf8',output_encoding='utf-8', encoding_errors='replace')
...
print sys.stdout.encoding
uname=cherrypy.session['userName']
print uname
kwargs['_toshow']=uname
...
return tempLook.get_template(page).render(**kwargs)

相关的模板文件：

...${_toshow}...

输出结果如下：

UTF-8
Deşghfkskhü
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)

我认为字符串本身没有任何问题，因为我可以正常打印它。

尽管我已经（很多次）尝试过 input/output_encoding 和 default_filters 参数，但它总是抱怨无法使用 ascii 编解码。

所以我决定尝试找到在文档中找到的示例，以下内容效果最佳：

input_encoding='utf-8', output_encoding='utf-8'
#(note : it still raised an error without output_encoding, despite tutorial not implying it)

随着

${u"voix m’a réveillé."}

并且结果是：

voix mâ�a rÃ©veillÃ©

我真的不明白为什么这不起作用。即使使用“魔术编码注释”，也无济于事。所有文件都采用UTF-8编码。

我已经花费了数小时，但没有任何进展，我是不是漏掉了什么？

更新：

现在我有一个更简单的问题：

既然所有变量都是unicode，那么如何让Mako渲染unicode字符串而不应用任何东西？传递一个空过滤器/render_unicode()也没有帮助。

- felace

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- knitti · Accepted Answer

是的，UTF-8 != Unicode。

UTF-8是一种特定的字符串编码方式，就像ASCII和ISO 8859-1一样。尝试这样做：

对于任何输入字符串，请执行inputstring.decode('utf-8')（或您得到的任何输入编码）。对于任何输出字符串，请执行outputstring.encode('utf-8')（或您想要的任何输出编码）。对于任何内部使用，请使用unicode字符串（'this is a normal string'.decode('utf-8') == u'this is a normal string'）

'foo'是一个字符串，u'foo'是一个Unicode字符串，它没有“编码”（无法解码）。因此，任何时候Python想要更改普通字符串的编码方式，它首先尝试“解码”它，然后再进行“编码”。默认值为“ascii”，但往往失败了：-）