将多个函数映射到CSV行

Question

将多个函数映射到CSV行

3

我想对完全由字符串组成的CSV数据进行一些类型转换。我考虑使用一个包含头部名称和函数映射关系的字典，并将这些函数映射到每个CSV行。但是，我在如何高效地映射多个函数到一行上有些困惑。我考虑枚举表头并创建一个新的索引与函数的字典：

header_map = {'Foo':str,
              'Bar':str,
              'FooBar':float}

csv_data = [('Foo', 'Bar', 'FooBar'),
            #lots of data...
           ]

index_map = {}

#enumerate the rows and create a dictionary of index:function
for i, header in enumerate(csv_data[0]):
    index_map[i] = header_map[header]

#retrieve the function for each index and call it on the value
new_csv = [[index_map[i](value) for i, value in enumerate(row)] 
           for row in csv_data[1:]]

我只是好奇有没有更简单高效的方法来完成这种操作？

- donopj2

3个回答

1

没有进行测试（没有样本输入），但这似乎可以满足您的要求：

heads = csv_data[0]
new_csv = heads + [
              tuple(header_map[head](item) for head, item in zip(heads, row))
          for row in csv_data[1:]]

- Lev Levitsky

0

如果您知道标题在标头中的顺序，可以使用函数列表而不是字典。

>>> header = [str, str, float]
>>> csv = [("aaa", "bbb", "3.14")] * 10
>>> map(lambda line: map(lambda f, arg: f(arg), header, line), csv)
[['aaa', 'bbb', 3.14], ['aaa', 'bbb', 3.14], ...

- Alexey Kachayev

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- unutbu · Accepted Answer

这是一个与IT相关的内容：“这里有一种方法，using_converter，速度稍快：”

import itertools as IT

header_map = {'Foo':str,
              'Bar':str,
              'FooBar':float}

N = 20000
csv_data = [('Foo', 'Bar', 'FooBar')] + [('Foo', 'Bar', 1123.451)]*N

def original(csv_data):
    index_map = {}
    #enumerate the rows and create a dictionary of index:function
    for i, header in enumerate(csv_data[0]):
        index_map[i] = header_map[header]

    #retrieve the appropriate function for each index and call it on the value
    new_csv = [[index_map[i](value) for i, value in enumerate(row)]
               for row in csv_data[1:]]
    return new_csv

def using_converter(csv_data):
    converters = IT.cycle([header_map[header] for header in csv_data[0]])
    conv = converters.next
    new_csv = [[conv()(item) for item in row] for row in csv_data[1:]]
    return new_csv

def using_header_map(csv_data):
    heads = csv_data[0]
    new_csv = [
        tuple(header_map[head](item) for head, item in zip(heads, row))
        for row in csv_data[1:]]
    return new_csv

# print(original(csv_data))
# print(using_converter(csv_data))
# print(using_header_map(csv_data))

使用 timeit 进行基准测试：

原始代码：

% python -mtimeit -s'import test' 'test.original(test.csv_data)'
100 loops, best of 3: 17.3 msec per loop

一个稍微更快的版本（使用itertools）：

% python -mtimeit -s'import test' 'test.using_converter(test.csv_data)'
100 loops, best of 3: 15.5 msec per loop

列夫·列维茨基的版本：

% python -mtimeit -s'import test' 'test.using_header_map(test.csv_data)'
10 loops, best of 3: 36.2 msec per loop