写入文件时出现Unicode编码错误

Question

写入文件时出现Unicode编码错误

19

我正在尝试将一些字符串写入文件中（这些字符串由HTML解析器BeautifulSoup提供）。

我可以使用“print”来显示它们，但是当我使用file.write()时，会出现以下错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 6: ordinal not in range(128)

我该如何解析这个内容？

- Ivy

3个回答

2

你的问题的答案是“使用编解码器”。附加代码还展示了一些gettext的技巧，顺便说一下。 http://wiki.wxpython.org/Internationalization

import codecs

import gettext

localedir = './locale'
langid = wx.LANGUAGE_DEFAULT # use OS default; or use LANGUAGE_JAPANESE, etc.
domain = "MyApp"             
mylocale = wx.Locale(langid)
mylocale.AddCatalogLookupPathPrefix(localedir)
mylocale.AddCatalog(domain)

translater = gettext.translation(domain, localedir, 
                                 [mylocale.GetCanonicalName()], fallback = True)
translater.install(unicode = True)

# translater.install() installs the gettext _() translater function into our namespace...

msg = _("A message that gettext will translate, probably putting Unicode in here")

# use codecs.open() to convert Unicode strings to UTF8

Logfile = codecs.open(logfile_name, 'w', encoding='utf-8')

Logfile.write(msg + '\n')

尽管谷歌上有很多关于这个问题的搜索结果，但我发现很难找到这个简单的解决方案（实际上在Python Unicode文档中有提到，但是被隐藏了）。所以...希望对你有所帮助... GaJ

- GreenAsJade

1

“简单”？这也显示了一堆i18n机器，OP并不关心它们 - 他不是想确保人们看到正确语言的文本，而是想从特定来源获取特定语言的文本并将其放入文件中。因此，您代码片段中唯一相关的部分是第一行和最后两行。至于“难以找到”，真的吗？你用什么谷歌搜索的？我尝试了UnicodeEncodeError：'ascii'编解码器无法编码字符；结果似乎足够有帮助... - Karl Knechtel

2

我尝试过这个，它可以正常工作。

with open(r"C:\rag\sampleoutput.txt", 'w', encoding="utf-8") as f:

- Raghavasimhan Sankarambadi Ram

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- yossi · Accepted Answer

当您将包含非英语字符（Unicode字符超出128）的Unicode字符串传递给期望ASCII字节串的内容时，就会发生此错误。Python字节串的默认编码为ASCII，“它仅处理128个（英文）字符”。这就是为什么尝试转换超出128的Unicode字符会产生错误的原因。

unicode()

unicode(string[, encoding, errors])

构造函数的签名为unicode(string[, encoding, errors])。它的所有参数都应该是8位字符串。

第一个参数使用指定的编码转换为Unicode；如果省略编码参数，则在转换中使用ASCII编码，因此大于127的字符将被视为错误。

例如：

s = u'La Pe\xf1a' 
print s.encode('latin-1')

或者

write(s.encode('latin-1'))

将使用Latin-1进行编码