csv.DictReader在读取csv文件时使用哪种编码？

Question

csv.DictReader在读取csv文件时使用哪种编码？

3

我有一个以UTF-8编码保存的csv文件。

它包含非ASCII字符[umlauts]。

我正在使用以下方式读取文件：

csv.DictReader(<file>,delimiter=<delimiter>).

我的问题是：

In which encoding is the file being read?
I noticed that in order to refer to the strings as utf-8 I need to perform:
```
str.decode('utf-8')
```
Is there a better approach then reading the file in one encoding and then to convert to another, i.e. utf-8?

[Python版本：2.7]

- Maoritzio

这个答案解决了我的问题：https://dev59.com/_G445IYBdhLWcg3wLXEK - ThomasW

2个回答

1

如何使用实例和类来实现这个目标呢？您可以将共享字典存储在类级别，并且使其加载Unicode文本文件，甚至可以使用BOM文件掩码进行编码检测。很久以前我写了一个简单的库，它用一个Unicode感知的函数覆盖了默认的open()函数。如果您执行import tendo.unicode，还可以改变csv库加载文件的方式。如果您的文件没有BOM头，该库将假定使用UTF-8而不是旧的ascii编码。如果需要，您甚至可以指定其他回退编码。

- sorin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Alastair McCormack · Accepted Answer

在Python 2.7中，CSV模块不会应用任何解码方式 - 它以二进制模式打开文件并返回字节字符串。

使用https://github.com/jdunck/python-unicodecsv，它可以实时解码。

用法如下：

with open("myfile.csv", 'rb') as my_file:    
    r = unicodecsv.DictReader(my_file, encoding='utf-8')

r将包含一个Unicode字典。重要的是，源文件以二进制模式打开。