寻找列表的众数

Question

寻找列表的众数

pythonmode

154

给定一个项目列表，回想一下，列表的众数是出现最频繁的项目。

我想知道如何创建一个函数，可以找到列表的众数，但如果列表没有众数（例如，列表中的所有项目仅出现一次），则显示消息。我想自己从头开始制作这个函数，而不引入任何函数。

- bluelantern

抱歉，您能解释一下“列表模式”具体是什么意思吗？ - Vikas

8

“众数”是一组数据中出现次数最多的元素（如果有的话）。有些定义将其扩展为计算所有这样的元素的算术平均值，如果有多个众数的话。 - Jeremy Roman

1

这里有很多错误的答案! 例如assert(mode[1, 1, 1]) == None和assert(mode[1, 2, 3, 4]) == None。一个数字要成为“众数”，它必须在列表中出现的次数比至少另一个数字多，并且它不能是列表中唯一的数字。 - lifebalance

27个回答

115

你可以使用Python的collections.Counter模块中提供的mode函数。请参考Python官方文档中的Counter和collections。

from collections import Counter
data = Counter(your_list_in_here)
data.most_common()   # Returns all unique items and their counts
data.most_common(1)  # Returns the highest occurring item

注意：Counter是Python 2.7中的新功能，早期版本不可用。

- Christian Witts

22

题目说明用户希望从头编写一个函数——即不使用导入任何模块。 - abcd

5

您的最后一行返回一个包含众数及其频率的元组列表，要仅获取众数，请使用 Counter(your_list_in_here).most_common(1)[0][0]。如果有多个众数，此方法会返回其中任意一个。 - Rory Daulton

1

假设有n个最常见的mode。如果Counter(your_list_in_here).most_common(1)[0][0]可以获取第一个mode，那么如何获取另一个最常见的mode？只需将最后一个0替换为1即可吗？可以编写一个函数来自定义mode以满足自己的需要。 - user7345804

2

如果有多个众数，我该如何返回其中最大的数字？ - Akin Hwan

如果您只想获取出现最频繁的项，可以使用data.most_common(1)[0][0]。 - Stef

1

@AkinHwan max(data.items(), key=lambda x: (x[1], x[0])) 或者使用 from operator import itemgetter，你可以重写为 max(data.items(), key=itemgetter(1,0))。 - Stef

77

Python 3.4 包含方法 statistics.mode，所以很简单:

>>> from statistics import mode
>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
 3

列表中可以包含任何类型的元素，不仅仅是数字：

>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
 'red'

- jabaldonedo

21

使用mode([1, 1, 1, 1, 2, 3, 3, 3, 3, 4])时会抛出错误，其中数字1和数字3重复相同的次数。理想情况下，该函数应该返回重复次数最多但相等的数字中最小的那个。统计错误：没有唯一众数；发现了两个同样常见的值。 - aman_novice

4

我尚未使用过这个3.4版的统计软件包，但scipy.stats.mode将返回最小值，在这种情况下为1。然而，在某些情况下，我更喜欢抛出错误... - wellplayed

3

@aman_novice，这个问题已经在Python 3.8中解决了。 https://docs.python.org/3/library/statistics.html#statistics.mode - Michael D

4

Python 3.8还新增了multimode函数，当有多个众数时，该函数会返回所有的众数。 - stason

34

参考一些统计软件，如SciPy和MATLAB，它们返回的是最常见的最小值，因此如果有两个值出现的频率相同，则返回其中较小的值。希望下面的例子能够帮助理解：

>>> from scipy.stats import mode

>>> mode([1, 2, 3, 4, 5])
(array([ 1.]), array([ 1.]))

>>> mode([1, 2, 2, 3, 3, 4, 5])
(array([ 2.]), array([ 2.]))

>>> mode([1, 2, 2, -3, -3, 4, 5])
(array([-3.]), array([ 2.]))

有没有什么理由，你不能遵循这个惯例？

- Chris

4

为什么当有多个模式时，只返回最小的那一个？ - zyxue

@zyxue 简单的统计约定 - chrisfs

2

@chrisfs 而且如果有多个众数，使其返回最大的一个？ - Akin Hwan

31

有许多简单的方法可以在Python中找到列表的众数，例如：

import statistics
statistics.mode([1,2,3,3])
>>> 3

或者，你可以通过计数找到最大值

max(array, key = array.count)

这两种方法的问题在于它们无法处理多个众数。第一种方法会返回错误，而第二种方法只能返回第一个众数。

要找到一组数据的众数，可以使用以下函数：

def mode(array):
    most = max(list(map(array.count, array)))
    return list(set(filter(lambda x: array.count(x) == most, array)))

- mathwizurd

3

使用众数时，如果有两个元素出现次数相同，则会出现错误。 - Abhishek Mishra

抱歉，我看到这个评论有点晚了。Statistics.mode(array) 会在存在多个众数时返回错误，但其他方法则不会。 - mathwizurd

(1) 它被描述为具有最高频率的数据点 (2) 数据集中可能有多个众数 (3) 对于连续值，可能无法找到众数（因为所有值都是唯一的） (4) 它也可以用于非数字数据 - thrinadhn

7

在扩展社区答案无法在列表为空时工作的情况下，这里是适用于模式的工作代码：

def mode(arr):
        if arr==[]:
            return None
        else:
            return max(set(arr), key=arr.count)

- Kardi Teknomo

5

如果您对最小值、最大值或所有模式感兴趣：

def get_small_mode(numbers, out_mode):
    counts = {k:numbers.count(k) for k in set(numbers)}
    modes = sorted(dict(filter(lambda x: x[1] == max(counts.values()), counts.items())).keys())
    if out_mode=='smallest':
        return modes[0]
    elif out_mode=='largest':
        return modes[-1]
    else:
        return modes

- tashuhka

3

稍微长一点，但可以有多种模式，并且可以获取字符串中出现最多的计数或混合数据类型。

def getmode(inplist):
    '''with list of items as input, returns mode
    '''
    dictofcounts = {}
    listofcounts = []
    for i in inplist:
        countofi = inplist.count(i) # count items for each item in list
        listofcounts.append(countofi) # add counts to list
        dictofcounts[i]=countofi # add counts and item in dict to get later
    maxcount = max(listofcounts) # get max count of items
    if maxcount ==1:
        print "There is no mode for this dataset, values occur only once"
    else:
        modelist = [] # if more than one mode, add to list to print out
        for key, item in dictofcounts.iteritems():
            if item ==maxcount: # get item from original list with most counts
                modelist.append(str(key))
        print "The mode(s) are:",' and '.join(modelist)
        return modelist

- timpjohns

3

数据集的众数是指在数据集中出现次数最多的成员或成员组合。如果有两个成员出现次数相同且均为最多，则该数据集具有两个众数，这称为双峰。

Following function modes() can work to find mode(s) in a given list of data:

import numpy as np; import pandas as pd

def modes(arr):
    df = pd.DataFrame(arr, columns=['Values'])
    dat = pd.crosstab(df['Values'], columns=['Freq'])
    if len(np.unique((dat['Freq']))) > 1:
        mode = list(dat.index[np.array(dat['Freq'] == max(dat['Freq']))])
        return mode
    else:
        print("There is NO mode in the data set")

Output:

# For a list of numbers in x as
In [1]: x = [2, 3, 4, 5, 7, 9, 8, 12, 2, 1, 1, 1, 3, 3, 2, 6, 12, 3, 7, 8, 9, 7, 12, 10, 10, 11, 12, 2]
In [2]: modes(x)
Out[2]: [2, 3, 12]
# For a list of repeated numbers in y as
In [3]: y = [2, 2, 3, 3, 4, 4, 10, 10]
In [4]: modes(y)
Out[4]: There is NO mode in the data set
# For a list of strings/characters in z as
In [5]: z = ['a', 'b', 'b', 'b', 'e', 'e', 'e', 'd', 'g', 'g', 'c', 'g', 'g', 'a', 'a', 'c', 'a']
In [6]: modes(z)
Out[6]: ['a', 'g']

If we do not want to import numpy or pandas to call any function from these packages, then to get this same output, modes() function can be written as:

def modes(arr):
    cnt = []
    for i in arr:
        cnt.append(arr.count(i))
    uniq_cnt = []
    for i in cnt:
        if i not in uniq_cnt:
            uniq_cnt.append(i)
    if len(uniq_cnt) > 1:
        m = []
        for i in list(range(len(cnt))):
            if cnt[i] == max(uniq_cnt):
                m.append(arr[i])
        mode = []
        for i in m:
            if i not in mode:
                mode.append(i)
        return mode
    else:
        print("There is NO mode in the data set")

- shubh

2

无论函数有多少种模式，此函数都会返回模式或模式，以及数据集中模式或模式的出现频率。如果没有模式（即所有项仅出现一次），则函数将返回错误字符串。这类似于上面A_nagpal的函数，但在我看来更为完整，而且我认为对于任何Python新手（例如阁下）阅读本问题更容易理解。

 def l_mode(list_in):
    count_dict = {}
    for e in (list_in):   
        count = list_in.count(e)
        if e not in count_dict.keys():
            count_dict[e] = count
    max_count = 0 
    for key in count_dict: 
        if count_dict[key] >= max_count:
            max_count = count_dict[key]
    corr_keys = [] 
    for corr_key, count_value in count_dict.items():
        if count_dict[corr_key] == max_count:
            corr_keys.append(corr_key)
    if max_count == 1 and len(count_dict) != 1: 
        return 'There is no mode for this data set. All values occur only once.'
    else: 
        corr_keys = sorted(corr_keys)
        return corr_keys, max_count

- user4406935

我之所以这么说，是因为你说“该函数返回一个错误字符串”。读取return 'There is no mode for this data set. All values occur only once.'的那一行可以通过使用traceback将其转换为错误消息，如下所示：

if condition: *next line with indent* raise ValueError('There is no mode for this data set. All values occur only once.')

。这里有一个错误类型列表，您可以引发不同类型的错误。 - user7345804

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- David Dao · Accepted Answer

198

您可以使用 max 函数和一个键。请参阅python max function using 'key' and lambda expression。

max(set(lst), key=lst.count)

- David Dao

7

考虑到不需要任何额外的导入，这是针对 OP 的正确答案。干得好，David！ - Jason Parham

18

在我看来，这个算法的时间复杂度应该是O(n**2)。它是吗？ - lirtosiast

11

这会导致二次运行时间。 - Padraic Cunningham

24

也可以使用 max(lst, key=lst.count)。（而且我真的不会把一个列表叫做 list。） - Stefan Pochmann

2

有人能解释一下这个如何适用于双峰分布吗？例如：a = [22, 33, 11, 22, 11]; print(max(set(a), key=a.count)) 返回 11。它是否总是返回最小的众数？如果是，为什么？ - battey

显示剩余6条评论