使用Python创建一个UTF-8格式的CSV文件

Question

使用Python创建一个UTF-8格式的CSV文件

pythonencodingutf-8csv

17

我无法在Python中创建一个UTF-8的csv文件。

我正在阅读它的文档，在示例部分，它说：

对于所有其他编码，可以使用以下UnicodeReader和UnicodeWriter类。它们在构造函数中接受附加的编码参数，并确保数据作为UTF-8编码通过真正的读取器或写入器：

好的。所以我有这段代码：

values = (unicode("Ñ", "utf-8"), unicode("é", "utf-8"))
f = codecs.open('eggs.csv', 'w', encoding="utf-8")
writer = UnicodeWriter(f)
writer.writerow(values)

我一直收到这个错误:

line 159, in writerow
    self.stream.write(data)
  File "/usr/lib/python2.6/codecs.py", line 686, in write
    return self.writer.write(data)
  File "/usr/lib/python2.6/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22: ordinal not in range(128)

请问有人可以帮我解决问题吗？我在调用UnicodeWriter类之前已经设置了所有的编码，但仍然出现错误。能否给我一点提示？

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

- Somebody still uses you MS-DOS

看起来问题出在 codecs.open 上。当我把它移除，只使用 open 时，就可以正常工作了。为什么呢？ - Somebody still uses you MS-DOS

4个回答

1

正如你已经发现的那样，如果你使用纯文本打开它是可以工作的。

原因是你尝试对UTF-8进行了两次编码。

f = codecs.open('eggs.csv', 'w', encoding="utf-8")

然后稍后在UnicodeWriter.writeRow中

# ... and reencode it into the target encoding
data = self.encoder.encode(data)

为了检查这是否有效，请使用您的原始代码并注释掉那一行。

问候

- KarlsFriend

1

我之前遇到了csv / unicode的挑战，然后在bitbucket上发布了这个链接：http://bitbucket.org/famousactress/dude_csv 如果你的需求比较简单，它可能适合你 :)

- royal

0

你不需要对所有内容进行“双重编码”。

你的应用程序应该完全使用Unicode编码。

只在codecs.open中进行编码，以将UTF-8字节写入外部文件。在应用程序内部不要进行其他编码。

- S.Lott

1

Csv模块不支持Unicode。为了让我的代码正常工作，我必须完全删除codecs.open。 - Somebody still uses you MS-DOS

如果CSV不支持Unicode，那么你就不能使用它来创建UTF-8，除非你想自己编写UTF-8编码器。 - S.Lott

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tamás · Accepted Answer

您不必使用codecs.open；UnicodeWriter 接受 Unicode 输入并负责将所有内容编码为 UTF-8。当UnicodeWriter 写入您传递给它的文件句柄时，所有内容已经以 UTF-8 编码（因此它可以与您使用open打开的普通文件一起使用）。

使用codecs.open实际上是将您的 Unicode 对象转换为 UTF-8 字符串，并尝试重新对这些字符串进行 UTF-8 编码，就好像这些字符串包含 Unicode 字符串一样，但显然会失败。