Python - 统计列表中特定范围的出现次数

Question

Python - 统计列表中特定范围的出现次数

4

我想要统计给定列表中浮点数出现的次数。比如，用户输入了一组成绩（都是100分制），并按照十分为一组进行排序。每一组中0-10分、10-20分、20-30分等得分段各出现了多少次？就像测试成绩分布一样。我知道可以使用 count 函数，但由于我不是在寻找特定的数字，所以我遇到了麻烦。有没有办法将 count 和 range 结合起来呢？感谢任何帮助。

- user1246457

4个回答

7

如果您愿意使用外部库NumPy，那么您只需要调用numpy.histogram()：

>>> data = [82, 85, 90, 91, 70, 87, 45]
>>> counts, bins = numpy.histogram(data, bins=10, range=(0, 100))
>>> counts
array([0, 0, 0, 0, 1, 0, 0, 1, 3, 2])
>>> bins
array([   0.,   10.,   20.,   30.,   40.,   50.,   60.,   70.,   80.,
         90.,  100.])

- Sven Marnach

4

decs = [int(x/10) for x in scores]

将得分从0-9映射为0，10-19映射为1，以此类推。然后通过类似于collections.Counter的方式计算0、1、2、3等出现的次数，并从那里映射回范围。

- Amber

1

从技术上讲，有一点不同 - x//10 会生成一个 float 结果，而 int(x/10) 会生成一个 int。 - Amber

1

当然，@RaymondHettinger - 但我没有提到list.count，所以我不确定你为什么要提到它。 - Amber

是的，它们不同，但我觉得 // 很棒，所以我会在任何时候提起它。 - Bi Rico

2

这种方法使用二分查找，可以更高效，但需要先对分数进行排序。

from bisect import bisect
import random

scores = [random.randint(0,100) for _ in xrange(100)]
bins = [20, 40, 60, 80, 100]

scores.sort()
counts = []
last = 0
for range_max in bins:
    i = bisect(scores, range_max, last)
    counts.append(i - last)
    last = i

我不希望你为此安装numpy，但如果你已经有了numpy，可以使用numpy.histogram。

更新

首先，使用bisect更加灵活。使用[i//n for i in scores]要求所有的bin都是相同大小的。使用bisect允许bin具有任意限制。另外，i//n表示范围为[lo, hi)。使用bisect的范围为(lo, hi]，但如果你想要[lo, hi)，可以使用bisect_left。

其次，bisect更快，见下面的时间。我用较慢的sorted(scores)替换了scores.sort()，因为排序是最慢的步骤，我不想用预排序的数组来偏置时间，但是OP说他/她的数组已经排序，所以在这种情况下使用bisect可能更有意义。

setup="""
from bisect import bisect_left
import random
from collections import Counter

def histogram(iterable, low, high, bins):
    step = (high - low) / bins
    dist = Counter(((x - low + 0.) // step for x in iterable))
    return [dist[b] for b in xrange(bins)]

def histogram_bisect(scores, groups):
    scores = sorted(scores)
    counts = []
    last = 0
    for range_max in groups:
        i = bisect_left(scores, range_max, last)
        counts.append(i - last)
        last = i
    return counts

def histogram_simple(scores, bin_size):
    scores = [i//bin_size for i in scores]
    return [scores.count(i) for i in range(max(scores)+1)]

scores = [random.randint(0,100) for _ in xrange(100)]
bins = range(10, 101, 10)
"""
from timeit import repeat
t = repeat('C = histogram(scores, 0, 100, 10)', setup=setup, number=10000)
print min(t)
#.95
t = repeat('C = histogram_bisect(scores, bins)', setup=setup, number=10000)
print min(t)
#.22
t = repeat('histogram_simple(scores, 10)', setup=setup, number=10000)
print min(t)
#.36

- Bi Rico

我同意@amber的观点。在这里使用bisect是浪费，因为你可以使用简单的除法来实现等间距的箱子。 - Raymond Hettinger

我认为@RaymondHettinger和我对bisect的使用方式与你在这里所做的不同（即使用bisect找到个人分数所属的箱子，这将是浪费）。对于大量的分数，你是正确的，bisect可能是高效的。 - Amber

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Raymond Hettinger · Accepted Answer

9

将数据进行分组，通过间隔宽度进行划分。要计算每个组中的数量，请考虑使用 collections.Counter。以下是一个带有文档和测试的详细示例：

from collections import Counter

def histogram(iterable, low, high, bins):
    '''Count elements from the iterable into evenly spaced bins

        >>> scores = [82, 85, 90, 91, 70, 87, 45]
        >>> histogram(scores, 0, 100, 10)
        [0, 0, 0, 0, 1, 0, 0, 1, 3, 2]

    '''
    step = (high - low + 0.0) / bins
    dist = Counter((float(x) - low) // step for x in iterable)
    return [dist[b] for b in range(bins)]

if __name__ == '__main__':
    import doctest
    print doctest.testmod()

- Raymond Hettinger

5

看起来你对在你之前写的四个回答都进行了负面评价。如果是这样，尽管你的回答可能是最好的，但可能并不是所有其他回答都值得被贬低，因为它们可能仍然有用（即使没有一个是最好的）。 - jcollado

谢谢您的回复。我正在尝试实现您的解决方案，但在数学运算符方面遇到了问题。例如，步长和距离变量无法在字符串上工作。如果我听起来像个彻头彻尾的新手（因为我确实是），是否有一种方法可以将分数强制转换为列表？我输入的是11、48、13、9、4。默认情况下它是一个字符串吗？ - user1246457

抱歉有多个注释。这是我正在使用的代码：'from collections import Counter def gradeDistribution(examScores, low, high, bins): step = (high - int(low) + 0.0) / bins dist = Counter((x - int(low)) // step for x in examScores) return [dist[b] for b in range(bins)] examScores=[raw_input("please enter scores")]gradeDistribution(examScores, 0, 100, 10)'我获得的错误消息是：TypeError: unsupported operand type(s) for -: 'str' and 'int' 感谢任何人可以提供的见解。 - user1246457

@user1246457 当发生这种情况时，您应该更新您的问题，如果您想确保某个特定的人被通知到更新，您可以留下一个简短的评论。这样，您就不会达到评论限制，您的代码也可以更易读。当用户输入分数时，它们被存储为字符串。在调用直方图函数之前，您应该将它们转换为浮点数或整数，例如 [float(i) for i in examScores]。 - Bi Rico

编辑了答案，包括从字符串输入进行float转换。 - Raymond Hettinger