UnicodeEncodeError: 'ascii'编解码器无法编码字符

Question

UnicodeEncodeError: 'ascii'编解码器无法编码字符

11

我有一个包含URL响应的字典。例如:

>>> d
{
0: {'data': u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'}
1: {'data': u'<p>some other data</p>'}
...
}

使用 xml.etree.ElementTree 函数处理这些数据值（d[0]['data']）时，我遇到了最常见的错误信息：UnicodeEncodeError: 'ascii' codec can't encode characters...。

为了使这个 Unicode 字符串适合 ElementTree 解析器，我该怎么做呢？

PS. 请不要发送有关 Unicode 和 Python 解释的链接。不幸的是，我已经阅读了所有内容，但无法利用它，希望其他人可以。

- theta

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martijn Pieters · Accepted Answer

你需要手动将其编码为UTF-8：

ElementTree.fromstring(d[0]['data'].encode('utf-8'))

由于API仅接受编码后的字节作为输入。对于这样的数据，UTF-8是一个很好的默认选择。

然后，它将能够再次解码为Unicode：

>>> from xml.etree import ElementTree
>>> p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
>>> p.text
u'found "\u62c9\u67cf \u591a\u516c \u56ed"'
>>> print p.text
found "拉柏 多公 园"