Python的DictReader - 如何将CSV列名变为小写?

12

我的CSV文件中列名是大写的。我正在使用csv.dictreader读取数据,但需要将列名转换为小写。

我在这里找到了这段代码Accessing csv header white space and case insensitive

    import csv

class DictReaderInsensitive(csv.DictReader):
    # This class overrides the csv.fieldnames property.
    # All fieldnames are without white space and in lower case

    @property
    def fieldnames(self):
        return [field.strip().lower() for field in super(DictReaderInsensitive, self).fieldnames]

    def __next__(self):
        # get the result from the original __next__, but store it in DictInsensitive

        dInsensitive = DictInsensitive()
        dOriginal = super(DictReaderInsensitive, self).__next__()

        # store all pairs from the old dict in the new, custom one
        for key, value in dOriginal.items():
            dInsensitive[key] = value

        return dInsensitive

class DictInsensitive(dict):
    # This class overrides the __getitem__ method to automatically strip() and lower() the input key

    def __getitem__(self, key):
        return dict.__getitem__(self, key.strip().lower())
抱歉,我只能用中文进行回答。
datafile = open(self.ifs_data_file,'rU')
        csvDict = DictReaderInsensitive(datafile)
        for row in csvDict:
            print row
            #self.db.ifs_data.insert(**row)
            #self.db.commit()

我遇到了这个错误

Traceback (most recent call last):
  File "D:\Development\python\supplier_review\supplier_review.py", line 239, in update_ifs_data
    for row in csvDict:
  File "D:\Python27_5\lib\csv.py", line 103, in next
    self.fieldnames
  File "D:\Development\python\supplier_review\supplier_review.py", line 288, in fieldnames
    return [field.strip().lower() for field in super(DictReaderInsensitive, self).fieldnames]
TypeError: must be type, not classobj
3个回答

18
你可以在将文件传递给 DictReader 之前将文件的第一行转换为小写:
import csv
import itertools

def lower_first(iterator):
    return itertools.chain([next(iterator).lower()], iterator)

with open(ifs_data_file, 'rU') as datafile:
    csvDict = csv.DictReader(lower_first(datafile))
    for row in csvDict:
        print row    

1
四年后,这仍然是一种有用、易于实现的技术。 - scottwed

9

DictReader是一个旧式对象,所以在这里super()无效。您需要直接访问父类中的property对象。在Python 2中,您想要重写.next()方法,而不是.__next__()

class DictReaderInsensitive(csv.DictReader):
    # This class overrides the csv.fieldnames property.
    # All fieldnames are without white space and in lower case

    @property
    def fieldnames(self):
        return [field.strip().lower() for field in csv.DictReader.fieldnames.fget(self)]

    def next(self):
        return DictInsensitive(csv.DictReader.next(self))

示例:

>>> example = '''\
... foo,Bar,BAZ
... 42,3.14159,Hello world!'''.splitlines()
>>> csvDict = DictReaderInsensitive(example)
>>> row = next(csvDict)
>>> print row
{'bar': '3.14159', 'foo': '42', 'baz': 'Hello world!'}
>>> row['BAZ']
'Hello world!'

谢谢你们两位的建议。我找到了另一种解决这个问题的方法,但说实话我不记得是什么了。我确实尝试了Martijn的方法,但对我来说没有起作用。 - PrestonDocks
3
很抱歉我的解决方案对您没有起作用;如果您让我知道您遇到的问题,也许我可以帮助您克服它们。正如您从我的答复中所看到的,我已经为您测试了代码。 - Martijn Pieters

6

如果想要更简单的方法,您可以在访问字典之前,直接更新DictReader.fieldnames属性,例如:

>>> f = open('example-x-y-time.csv', 'rb')
>>> reader = csv.DictReader(f)
>>> reader.fieldnames
['Latitude', 'Longitude', 'Date']
>>> print next(reader)
{'Latitude': '44.8982391', 'Date': '2004-07-12', 'Longitude': '-117.7791061'}
>>> reader.fieldnames = [name.lower() for name in reader.fieldnames]
>>> print next(reader)
{'latitude': '44.6637001', 'date': '1964-04-03', 'longitude': '-123.5997009'}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接