在Python中解码已编码的Unicode字符串

Question

在Python中解码已编码的Unicode字符串

4

我需要解码一个“UNICODE”编码的字符串：

>>> id = u'abcdß'
>>> encoded_id = id.encode('utf-8')
>>> encoded_id
'abcd\xc3\x9f'

我遇到的问题是：使用Pylons路由，我得到的编码后的id变量为一个Unicode字符串u'abcd\xc3\x9f'，而不是一个普通字符串'abcd\xc3\x9f'：

在Python中，如何解码我的编码后的id变量，它是一个Unicode字符串？

>>> encoded_id = u'abcd\xc3\x9f'
>>> encoded_id.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/test/vng/lib64/python2.6/encodings/utf_8.py", line 16, in         decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 4-5: ordinal not in range(128)

- alloyoussef

如果可能的话，您应该找出为什么从Pylons获取的字符串被错误地解码为“latin-1”（或其近亲“windows-1252”），而不是一开始就使用“utf-8”。 - Mark Tolonen

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martijn Pieters · Accepted Answer

您有UTF-8编码的数据（不存在UNICODE编码的数据）。

将Unicode值编码为Latin-1，然后从UTF8解码：

encoded_id.encode('latin1').decode('utf8')

Latin 1将前255个Unicode点一对一地映射到字节。

示例：

>>> encoded_id = u'abcd\xc3\x9f'
>>> encoded_id.encode('latin1').decode('utf8')
u'abcd\xdf'
>>> print encoded_id.encode('latin1').decode('utf8')
abcdß