如何找到列表中出现最频繁的元素？

Question

如何找到列表中出现最频繁的元素？

53

给定以下列表

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 
 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 
 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 
 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 
 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 
 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 
 'Moon', 'to', 'rise.', '']

我想要统计每个单词出现的次数，并显示前三个。

然而，我只希望找到首字母大写的前三个单词，并忽略所有首字母不大写的单词。

我相信还有更好的方法，但我的想法是：

将列表中的第一个单词放入另一个名为uniquewords的列表中
删除原始列表中第一个单词及其重复项
将新的第一个单词添加到unique words中
从原始列表中删除第一个单词及其重复项
等等...
直到原始列表为空为止...
计算uniquewords中每个单词在原始列表中出现的次数
找到前三个并打印

- user434180

1

这不是另一个问题的副本，因为其他问题上的一些解决方案（statistics.mode）无法解决此问题。 - user202729

11个回答

23

如果你使用的是早期版本的Python，或者你有很好的理由要编写自己的单词计数器（我很想听听！），那么你可以尝试使用一个 dict 来实现以下方法。

Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> word_list = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> word_counter = {}
>>> for word in word_list:
...     if word in word_counter:
...         word_counter[word] += 1
...     else:
...         word_counter[word] = 1
... 
>>> popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
>>> 
>>> top_3 = popular_words[:3]
>>> 
>>> top_3
['Jellicle', 'Cats', 'and']

顶级贴士：当你想尝试像这样的算法时，交互式的Python解释器是你的好朋友。只需输入它，观察它的运行过程并检查其中的元素即可。

- johnsyweb

谢谢这个……但是我该如何让它只查找首字母大写的单词，忽略其他所有单词。另外，如果一个单词出现多次，有时大写，有时小写，那么只计算单词首字母大写的情况。 - user434180

1

如果你不添加任何以小写字母开头的单词到word_counter中，那么这听起来很像作业（并且问题应该被标记为作业）。如果你更新你的问题以显示这是一个要求，并且你已经尝试过自己做这个，人们更有可能帮助你。 - johnsyweb

@Johnsyweb - 在同一个话题上，我正在尝试遍历“popular_words”列表以显示单词的名称以及它们旁边的计数...到目前为止我没有成功，你能指点我正确的方向吗？提前致谢。 - drew

@andrew_：https://dev59.com/mXA65IYBdhLWcg3w2yk2#3594522 似乎正是这样做的。 - johnsyweb

23

只需返回一个包含最常见单词的列表：

from collections import Counter
words=["i", "love", "you", "i", "you", "a", "are", "you", "you", "fine", "green"]
most_common_words= [word for word, word_count in Counter(words).most_common(3)]
print most_common_words

这将打印：

['you', 'i', 'a']

在"most_common(3)"中的数字3指定了要打印的项数。 Counter(words).most_common()返回一个由元组组成的列表，每个元组的第一个成员是单词，第二个成员是频率。元组按单词的频率排序。

`most_common = [item for item in Counter(words).most_common()]
print(str(most_common))
[('you', 4), ('i', 2), ('a', 1), ('are', 1), ('green', 1), ('love',1), ('fine', 1)]`

“word for word, word_counter in”这个语句只提取元组的第一个成员。

- unlockme

通过 most_common 函数返回出现次数的数量是可能的吗？ - almost a beginner

1

是的，几乎是初学者，我可以让我编辑答案来向您展示如何。 - unlockme

14

难道不就是这样吗……

word_list=['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 
 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 
 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 
 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 
 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 
 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 
 'Moon', 'to', 'rise.', ''] 

from collections import Counter
c = Counter(word_list)
c.most_common(3)

应该输出

[('Jellicle', 6), ('Cats', 5), ('are', 3)]

- Tim Seed

7

nltk对于许多语言处理工作非常方便。它内置了频率分布的方法。类似这样：

import nltk
fdist = nltk.FreqDist(your_list) # creates a frequency distribution from a list
most_common = fdist.max()    # returns a single element
top_three = fdist.keys()[:3] # returns a list

- mmmdreg

7

有两种标准库方法可以找到列表中出现最频繁的值：

statistics.mode：

from statistics import mode
most_common = mode([3, 2, 2, 2, 1, 1])  # 2
most_common = mode([3, 2])  # StatisticsError: no unique mode

如果没有唯一的最常见值，则引发异常
仅返回单个最常见值

collections.Counter.most_common：

from collections import Counter
most_common, count = Counter([3, 2, 2, 2, 1, 1]).most_common(1)[0]  # 2, 3
(most_common_1, count_1), (most_common_2, count_2) = Counter([3, 2, 2]).most_common(2)  # (2, 2), (3, 1)

能够返回多个最频繁的值
同时返回元素计数

因此，在这个问题的情况下，第二个选项是正确的选择。顺便提一句，从性能上来说，两者是相同的。

- Matthew D. Scholefield

6

一个简单的、只有两行代码的解决方案，不需要任何额外的模块，如下所示：

lst = ['Jellicle', 'Cats', 'are', 'black', 'and','white,',
       'Jellicle', 'Cats','are', 'rather', 'small;', 'Jellicle', 
       'Cats', 'are', 'merry', 'and','bright,', 'And', 'pleasant',    
       'to','hear', 'when', 'they', 'caterwaul.','Jellicle', 
       'Cats', 'have','cheerful', 'faces,', 'Jellicle',
       'Cats','have', 'bright', 'black','eyes;', 'They', 'like',
       'to', 'practise','their', 'airs', 'and', 'graces', 'And', 
       'wait', 'for', 'the', 'Jellicle','Moon', 'to', 'rise.', '']

lst_sorted=sorted([ss for ss in set(lst) if len(ss)>0 and ss.istitle()], 
                   key=lst.count, 
                   reverse=True)
print lst_sorted[0:3]

输出：

['Jellicle', 'Cats', 'And']

术语方括号中返回列表中所有唯一的字符串，这些字符串不为空且以大写字母开头。然后使用sorted()函数按照它们在列表中出现的频率（使用lst.count键）进行排序，倒序排列。请保留HTML标记。

- Chrigi

2

假设您的列表存储在“l”中，完成这个任务的简单方法如下：

简单的方式是：

>>> counter = {}
>>> for i in l: counter[i] = counter.get(i, 0) + 1
>>> sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
[(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]

完整示例：

>>> l = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> counter = {}
>>> for i in l: counter[i] = counter.get(i, 0) + 1
... 
>>> counter
{'and': 3, '': 1, 'merry': 1, 'rise.': 1, 'small;': 1, 'Moon': 1, 'cheerful': 1, 'bright': 1, 'Cats': 5, 'are': 3, 'have': 2, 'bright,': 1, 'for': 1, 'their': 1, 'rather': 1, 'when': 1, 'to': 3, 'airs': 1, 'black': 2, 'They': 1, 'practise': 1, 'caterwaul.': 1, 'pleasant': 1, 'hear': 1, 'they': 1, 'white,': 1, 'wait': 1, 'And': 2, 'like': 1, 'Jellicle': 6, 'eyes;': 1, 'the': 1, 'faces,': 1, 'graces': 1}
>>> sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
[(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]

简单来说，就是能在几乎所有的Python版本中运行。

如果你不理解这个示例中使用的某些函数，你可以在解释器中执行以下操作（在粘贴上面的代码后）：

>>> help(counter.get)
>>> help(sorted)

- jvdneste

2

@Mark Byers的回答最好，但是如果您使用的Python版本低于2.7（但至少为2.5，这在今天来说相对比较古老），则可以通过defaultdict轻松地复制Counter类功能（否则，对于Python < 2.5，在d[i] + =1之前需要额外三行代码，如@Johnnysweb的答案）。

from collections import defaultdict
class Counter():
    ITEMS = []
    def __init__(self, items):
        d = defaultdict(int)
        for i in items:
            d[i] += 1
        self.ITEMS = sorted(d.iteritems(), reverse=True, key=lambda i: i[1])
    def most_common(self, n):
        return self.ITEMS[:n]

然后，您可以像Mark Byers的回答中一样使用该类，即：

words_to_count = (word for word in word_list if word[:1].isupper())
c = Counter(words_to_count)
print c.most_common(3)

- JJC

2

我将使用Python中强大的数组计算模块numpy来回答这个问题。

下面是代码片段：

import numpy
a = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 
 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 
 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 
 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 
 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 
 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 
 'Moon', 'to', 'rise.', '']
dict(zip(*numpy.unique(a, return_counts=True)))

输出

{'': 1, 'And': 2, 'Cats': 5, 'Jellicle': 6, 'Moon': 1, 'They': 1, 'airs': 1, 'and': 3, 'are': 3, 'black': 2, 'bright': 1, 'bright,': 1, 'caterwaul.': 1, 'cheerful': 1, 'eyes;': 1, 'faces,': 1, 'for': 1, 'graces': 1, 'have': 2, 'hear': 1, 'like': 1, 'merry': 1, 'pleasant': 1, 'practise': 1, 'rather': 1, 'rise.': 1, 'small;': 1, 'the': 1, 'their': 1, 'they': 1, 'to': 3, 'wait': 1, 'when': 1, 'white,': 1}

输出结果是以字典对象的形式呈现，其中包含键值对（key, value），其中value是特定单词出现的次数。

这个答案受到了stackoverflow上另一个回答的启发，你可以在这里查看它。

- Rushikesh

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mark Byers · Accepted Answer

在 Python 2.7 及以上版本中，有一个叫做 Counter 的类可以帮助你：

from collections import Counter
words_to_count = (word for word in word_list if word[:1].isupper())
c = Counter(words_to_count)
print c.most_common(3)

结果：

[('Jellicle', 6), ('Cats', 5), ('And', 2)]

我刚学编程，请尽可能用最简单的方法实现。你可以使用一个字典，将单词作为键，出现次数作为值来实现。首先遍历单词列表，将其添加到字典中（如果不存在），否则增加该单词的计数。要找到前三个单词，你可以使用简单的O(n*log(n))排序算法，然后从结果中取前三个元素，或者使用O(n)算法扫描列表一次，仅记住前三个元素即可。

对于初学者来说，重要的观察是通过使用专门设计用于此目的的内置类，您可以节省大量工作和/或获得更好的性能。熟悉标准库及其提供的功能是很好的。