我需要分析一个泰米尔语文本文件(utf-8编码)。我在Python的IDLE界面上使用nltk包。当我尝试在界面上读取文本文件时,出现了以下错误。请问如何避免这个错误?
corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read()
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read()
File "C:\Users\Customer\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 33: character maps to <undefined>
your_bytes.decode("UTF-8")
将它们解码成字符串。 - byxor