UnicodeDecodeError: 'ascii' 编解码器无法解码字节 0xc2

Question

UnicodeDecodeError: 'ascii' 编解码器无法解码字节 0xc2

41

我正在使用Python创建XML文件，其中有一个字段需要放置文本文件的内容。我是通过以下方式实现的：

f = open ('myText.txt',"r")
data = f.read()
f.close()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml")

然后我遇到了UnicodeDecodeError。我已经尝试在脚本顶部添加特殊注释# -*- coding: utf-8 -*-，但仍然出现错误。我也已经尝试强制编码我的变量data.encode('utf-8')，但仍然出现错误。我知道这个问题非常普遍，但是其他问题中得到的所有解决方案对我都没有用。

更新：

回溯：只使用脚本第一行的特殊注释。

Traceback (most recent call last):
  File "D:\Python\lse\createxml.py", line 151, in <module>
    tree.write("D:\\python\\lse\\xmls\\" + items[ctr][0] + ".xml")
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write
    serialize(write, self._root, encoding, qnames, namespaces)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata
    return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 243: ordina
l not in range(128)

回溯：使用.encode('utf-8')

。

Traceback (most recent call last):
  File "D:\Python\lse\createxml.py", line 148, in <module>
    field.text = data.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 227: ordina
l not in range(128)

我使用了.decode('utf-8')，错误信息没有出现并且成功创建了XML文件。但问题是我的浏览器无法查看这个XML文件。

- kagat-kagat

1

看到完整的错误信息会很有用，以便确定错误来源。同时，尝试使用decode而不是encode。 - Mark Ransom

2

请注意，使用 # -*- coding: utf-8 -*- 只是为了在 Python 源代码中插入非 ASCII 字符。它不会以任何方式影响字符串的编码/解码。此外，如果文件 myText.txt 不是 ASCII 格式，则应使用 codecs.open 并提供正确的编码方式：codecs.open('myText.txt', 'r', 'utf-8')。 - Bakuriu

此外，如果您的文本不仅仅是ASCII，请在tree.write中添加编码（请参阅文档）。 - Thomas Fenzl

1

可能是一个不间断空格。只是说一下。在Mac上是Option + Space。UTF-8中的0xC2 0xA0。 - superlukas

@kagat-kagat：其中一个答案解决了您的问题吗？如果是，请接受它以标记问题已解决。 - MERose

显示剩余2条评论

4个回答

12

我在使用pywikipediabot时遇到了类似的错误。虽然.decode方法是朝着正确方向迈出的一步，但对我来说，如果不添加'ignore'参数它是无效的：

ignore_encoding = lambda s: s.decode('utf8', 'ignore')

忽略编码错误可能导致数据丢失或产生不正确的输出。但如果你只是想完成任务并且细节并不是非常重要，这可能是更快的方法。

- the

11

请注意，忽略编码错误可能会丢失数据或产生不正确的输出。 - tripleee

11

Python 2

这个错误是由于 ElementTree 在尝试将 XML 写出时没有预料到会发现非 ASCII 字符串。你应该使用 Unicode 字符串来代替非 ASCII 字符串。Unicode 字符串可以通过在字符串上使用 u 前缀（如 u'€'），或者使用适当的编码对字符串进行解码，例如 mystr.decode('utf-8')。

最佳实践是读取所有文本数据时解码，而不是在程序中间解码。 io 模块提供了一个open() 方法，它会在读取时将文本数据解码为 Unicode 字符串。

如果您使用 Unicode，ElementTree 就会更加高兴，并在使用ET.write() 方法时正确地编码它。

另外，为了获得最好的兼容性和可读性，请确保 ET 在 write() 时编码为 UTF-8 并添加相关标头。

假设您的输入文件是以 UTF-8 编码的（0xC2 是常见的 UTF-8 前导字节），将所有内容放在一起，并使用 with 语句，您的代码应该像这样：

with io.open('myText.txt', "r", encoding='utf-8') as f:
    data = f.read()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml", encoding='utf-8', xml_declaration=True)

输出：

<?xml version='1.0' encoding='utf-8'?>
<add><doc><field name="text">data€</field></doc></add>

- Alastair McCormack

1

#!/usr/bin/python

# encoding=utf8

尝试在 Python 文件开头添加此内容

- Ankit Kumar Rathod

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- uhbif19 · Accepted Answer

在使用数据之前，您需要将输入字符串解码为Unicode，以避免出现编码问题。

field.text = data.decode("utf8")