在字典列表中计算条目数量：使用for循环还是list comprehension与map（itemgetter）相比？

Question

在字典列表中计算条目数量：使用for循环还是list comprehension与map（itemgetter）相比？

4

在我编写的Python程序中，我已经比较了使用for循环和增量变量与使用map(itemgetter)和len()的列表推导式，在计算包含在列表中的字典条目时。使用每种方法所需的时间相同。我是做错了什么还是有更好的方法？

这是一个大大简化和缩短的数据结构：

list = [
  {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'biscuits and gravy'},
  {'key1': False, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'peaches and cream'},
  {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': False, 'filenotfound': 'Abbott and Costello'},
  {'key1': False, 'dontcare': False, 'ignoreme': True, 'key2': False, 'filenotfound': 'over and under'},
  {'key1': True, 'dontcare': True, 'ignoreme': False, 'key2': True, 'filenotfound': 'Scotch and... well... neat, thanks'}
]

以下是 for 循环版本：

#!/usr/bin/env python
# Python 2.6
# count the entries where key1 is True
# keep a separate count for the subset that also have key2 True

key1 = key2 = 0
for dictionary in list:
    if dictionary["key1"]:
        key1 += 1
        if dictionary["key2"]:
            key2 += 1
print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2)

以上数据的输出结果为：

Counts: key1: 3, subset key2: 2

这是另一种更符合Python风格的版本：

#!/usr/bin/env python
# Python 2.6
# count the entries where key1 is True
# keep a separate count for the subset that also have key2 True
from operator import itemgetter
KEY1 = 0
KEY2 = 1
getentries = itemgetter("key1", "key2")
entries = map(getentries, list)
key1 = len([x for x in entries if x[KEY1]])
key2 = len([x for x in entries if x[KEY1] and x[KEY2]])
print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2)

以上数据的输出结果（与之前相同）：

Counts: key1: 3, subset key2: 2

我有点惊讶这些方法需要同样的时间。我想知道是否有更快的方法，我肯定是忽略了一些简单的东西。

一个我考虑过的替代方案是将数据加载到数据库中并进行SQL查询，但是数据不需要持久化，我还需要分析数据传输的开销等问题，并且并不总是有可用的数据库。

我无法控制数据的原始形式。

上面的代码并不追求样式点数。

- Dennis Williamson

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Alex Martelli · Accepted Answer

我认为你正在错误地测量代码，因为你将要测量的代码淹没在了大量的开销中（在顶层模块中运行而不是在函数中运行，进行输出）。将这两个代码片段放入名为forloop和withmap的函数中，并在列表定义的结尾]之后添加一个* 100，使得测量结果更加实质化。在我的缓慢笔记本电脑上，我看到：

$ py26 -mtimeit -s'import co' 'co.forloop()'
10000 loops, best of 3: 202 usec per loop
$ py26 -mtimeit -s'import co' 'co.withmap()'
10 loops, best of 3: 601 usec per loop

即，据称更“Pythonic”的使用map方法的效率比纯粹使用for循环慢三倍——这说明它并不真正是“更Pythonic”;-)。

好的Python编程的标志是简单，我自负地称之为...：

def thebest():
  entries = [d['key2'] for d in list if d['key1']]
  return len(entries), sum(entries)

经测量，相比使用for循环方法，可以节省10%~20%的时间。