UnicodeDecodeError: 'ascii'编解码器无法解码

Question

UnicodeDecodeError: 'ascii'编解码器无法解码

pythonfileencodingdecodingrepresentation

8

我将使用file.readline()函数在Python中读取一个包含罗马尼亚语单词的文件。由于编码问题，我遇到了许多字符方面的问题。

例如：

>>> a = "aberație"  #type 'str'
>>> a -> 'abera\xc8\x9bie'
>>> print sys.stdin.encoding
UTF-8

我尝试使用utf-8、cp500等进行编码，但都不起作用。

我找不到正确的字符编码应该使用哪个？

提前感谢您的帮助。

编辑：目标是将文件中的单词存储在字典中，并在打印时获得“aberație”，而不是“abera\xc8\x9bie”。

- lilawood

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Claudiu · Accepted Answer

你想要做什么？

这是一组字节：

BYTES = 'abera\xc8\x9bie'

这是一组字节，表示字符串 "aberație" 的 utf-8 编码。您需要解码这些字节才能得到Unicode字符串：

>>> BYTES 
'abera\xc8\x9bie'
>>> print BYTES 
aberaÈ›ie
>>> abberation = BYTES.decode('utf-8')
>>> abberation 
u'abera\u021bie'
>>> print abberation 
aberație

如果你想将 Unicode 字符串存储到文件中，那么你需要对其进行编码，选择一个特定的字节格式：

>>> abberation.encode('utf-8')
'abera\xc8\x9bie'
>>> abberation.encode('utf-16')
'\xff\xfea\x00b\x00e\x00r\x00a\x00\x1b\x02i\x00e\x00'