在Jinja2模板中使用utf-8字符

Question

在Jinja2模板中使用utf-8字符

pythonpython-2.7utf-8character-encodingjinja2

33

我正在尝试使用utf-8字符在Jinja2中渲染模板。这是我的模板的样子：

<!DOCTYPE HTML>
<html manifest="" lang="en-US">
<head>
    <meta charset="UTF-8">
    <title>{{title}}</title>
...

标题变量设置方式如下：

index_variables = {'title':''}
index_variables['title'] = myvar.encode("utf8")

template = env.get_template('index.html')
index_file = open(preview_root + "/" + "index.html", "w")

index_file.write(
    template.render(index_variables)
)
index_file.close()

现在，问题在于myvar是从消息队列中读取的消息，并且可能包含那些特殊的UTF-8字符（例如“Séptimo Cine”）。

渲染模板看起来像：

...
    <title>S\u00e9ptimo Cine</title>
...

我希望它成为：

...
    <title>Séptimo Cine</title>
...

我进行了几次测试，但是无法使其工作。

我尝试在没有使用.encode("utf8")的情况下设置标题变量，但它会抛出异常（ValueError：Expected a bytes object, not a unicode object），所以我的猜测是初始消息是unicode。
我使用chardet.detect获取消息的编码（它是"ascii"），然后执行以下操作：myvar.decode("ascii").encode("cp852")，但标题仍然未正确呈现。
我还确保我的模板是UTF-8文件，但没有任何区别。

有任何想法吗？

- alex.ac

3个回答

5

尝试将您的渲染命令更改为以下内容...

template.render(index_variables).encode( "utf-8" )

Jinja2文档表示：“这将返回作为Unicode字符串呈现的模板。”

http://jinja.pocoo.org/docs/api/?highlight=render#jinja2.Template.render

希望这可以帮到你！

- Andrew Kloos

我尝试在模板渲染后添加 encode("utf-8")，但这还不够。 - alex.ac

如果你要写入文件，那么你需要将它重新编码为 utf-8。这个答案肯定会有所帮助。 - lorem

1

这对我很有帮助，当我尝试渲染一个包含西里尔字符的电子邮件模板时。 - The Welsh Dragon

-6

将以下行添加到您的脚本开头，它将可以在不需要进一步更改的情况下正常工作：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

- asmaier

2

请不要这样做。请不要传播这种货物崇拜。sys模块中的此设置被禁用是有原因的；它是一个全局设置，任何依赖于隐式编码或解码对非ASCII文本抛出异常的代码都将在此更改后中断。这包括第三方库中的代码。 - Martijn Pieters

我非常清楚那篇帖子，而且强烈不同意。你有没有看到我在那里也发布了一个答案？ - Martijn Pieters

这仍然是一种货物崇拜，在UnicodeEncoding异常引发时被推出。这不是这里的解决方案。 - Martijn Pieters

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lukas Graf · Accepted Answer

简短概括：

在template.render()中传递Unicode
在将渲染后的Unicode结果写入文件之前，将其编码为字节字符串

这让我犯了一会儿愁。因为你做的是...

index_file.write(
    template.render(index_variables)
)

总的来说，就 Python 而言，一条语句基本上只有一行，所以你得到的回溯信息是误导性的：当我重新创建你的测试用例时遇到的异常并没有发生在 template.render(index_variables) ，而是发生在 index_file.write() 中。因此，将代码拆分成这样

output = template.render(index_variables)
index_file.write(output)

这是诊断UnicodeEncodeError错误发生位置的第一步。

Jinja在渲染模板时返回Unicode。因此，在将其写入文件之前，您需要将结果编码为字节字符串：

index_file.write(output.encode('utf-8'))

第二个错误是将一个utf-8编码的字节串传递给 template.render()，而Jinja 需要 unicode。因此，假设您的 myvar 包含 UTF-8，您需要先将其解码为 unicode：

index_variables['title'] = myvar.decode('utf-8')

所以，综上所述，这对我行得通：

# -*- coding: utf-8 -*-

from jinja2 import Environment, PackageLoader
env = Environment(loader=PackageLoader('myproject', 'templates'))


# Make sure we start with an utf-8 encoded bytestring
myvar = 'Séptimo Cine'

index_variables = {'title':''}

# Decode the UTF-8 string to get unicode
index_variables['title'] = myvar.decode('utf-8')

template = env.get_template('index.html')

with open("index_file.html", "wb") as index_file:
    output = template.render(index_variables)

    # jinja returns unicode - so `output` needs to be encoded to a bytestring
    # before writing it to a file
    index_file.write(output.encode('utf-8'))