在一个字符串中找到出现次数最多的字符

Question

在一个字符串中找到出现次数最多的字符

pythonalgorithmoptimizationtime-complexity

12

我在stackoverflow上看到一份工作招聘信息时发现了这个编程问题。作为一个初学者Python程序员，我尝试着解决它。然而，我感觉我的解决方案有点...凌乱...有人能提出任何优化或使它更加简洁的建议吗？我知道这很琐碎，但我写它的过程还是挺有趣的。注意：Python 2.6

题目:

编写伪代码（或实际代码）的函数，它接收一个字符串并返回该字符串中出现最多的字母。

我的尝试:

import string

def find_max_letter_count(word):

    alphabet = string.ascii_lowercase
    dictionary = {}

    for letters in alphabet:
        dictionary[letters] = 0

    for letters in word:
        dictionary[letters] += 1

    dictionary = sorted(dictionary.items(), 
                        reverse=True, 
                        key=lambda x: x[1])

    for position in range(0, 26):
        print dictionary[position]
        if position != len(dictionary) - 1:
            if dictionary[position + 1][1] < dictionary[position][1]:
                break

find_max_letter_count("helloworld")

输出：

>>> 
('l', 3)

更新的示例：

find_max_letter_count("balloon") 
>>>
('l', 2)
('o', 2)

- Sunandmoon

附带说明：您应该阅读PEP 8，其中记录了推荐的Python编码风格。方法应该使用snake_case而不是mixedCase。 - Chris Morgan

可能是重复的问题：如何找到列表中最常见的元素？ - kennytm

可能是重复的问题：Python中列表中最常见的元素 - nawfal

19个回答

5

如果您使用的是Python 2.7，您可以通过使用collections模块来快速完成此操作。 collections是一个高性能数据结构模块。更多信息请参见http://docs.python.org/library/collections.html#counter-objects。

>>> from collections import Counter
>>> x = Counter("balloon")
>>> x
Counter({'o': 2, 'a': 1, 'b': 1, 'l': 2, 'n': 1})
>>> x['o']
2

- meson10

2

我使用的方法不使用Python本身的任何内置函数，只使用for循环和if语句。

def most_common_letter():
    string = str(input())
    letters = set(string)
    if " " in letters:         # If you want to count spaces too, ignore this if-statement
        letters.remove(" ")
    max_count = 0
    freq_letter = []
    for letter in letters:
        count = 0
        for char in string:
            if char == letter:
                count += 1
        if count == max_count:
            max_count = count
            freq_letter.append(letter)
        if count > max_count:
            max_count = count
            freq_letter.clear()
            freq_letter.append(letter)
    return freq_letter, max_count

这可以确保您获得使用最多的每个字母/字符，而不仅仅是一个。它还返回它出现的频率。希望这有所帮助 :)

- soggycornflakes

2

下面是使用字典查找最常见字符的方法

message = "hello world"
d = {}
letters = set(message)
for l in letters:
    d[message.count(l)] = l

print d[d.keys()[-1]], d.keys()[-1]

- kyle k

2

这里有一种使用FOR循环和COUNT()的方法。

w = input()
r = 1
for i in w:
    p = w.count(i)
    if p > r:
        r = p
        s = i
print(s)

- Chirag Madaan

1

如果您想要具有最大计数的所有字符，则可以对迄今为止提出的两个想法之一进行变化：

import heapq  # Helps finding the n largest counts
import collections

def find_max_counts(sequence):
    """
    Returns an iterator that produces the (element, count)s with the
    highest number of occurrences in the given sequence.

    In addition, the elements are sorted.
    """

    if len(sequence) == 0:
        raise StopIteration

    counter = collections.defaultdict(int)
    for elmt in sequence:
        counter[elmt] += 1

    counts_heap = [
        (-count, elmt)  # The largest elmt counts are the smallest elmts
        for (elmt, count) in counter.iteritems()]

    heapq.heapify(counts_heap)

    highest_count = counts_heap[0][0]

    while True:

        try:
            (opp_count, elmt) = heapq.heappop(counts_heap)
        except IndexError:
            raise StopIteration

        if opp_count != highest_count:
            raise StopIteration

        yield (elmt, -opp_count)

for (letter, count) in find_max_counts('balloon'):
    print (letter, count)

for (word, count) in find_max_counts(['he', 'lkj', 'he', 'll', 'll']):
    print (word, count)

例如，这将产生：

lebigot@weinberg /tmp % python count.py
('l', 2)
('o', 2)
('he', 2)
('ll', 2)

这适用于任何序列：单词，也包括 ['hello'，'hello'，'bonjour']。

heapq 结构非常有效地查找序列中最小的元素，而无需完全排序。另一方面，由于字母表中的字母不是很多，您可能还可以遍历计数的排序列表，直到不再找到最大计数为止，而不会造成任何严重的速度损失。

- Eric O. Lebigot

1

在字符串中出现最频繁的字符是什么? 输入字符串中出现最多的字符。

方法 1:

a = "GiniGinaProtijayi"

d ={}
chh = ''
max = 0 
for ch in a : d[ch] = d.get(ch,0) +1 
for val in sorted(d.items(),reverse=True , key = lambda ch : ch[1]):
    chh = ch
    max  = d.get(ch)
    
    
print(chh)  
print(max)

方法二：

a = "GiniGinaProtijayi"

max = 0 
chh = ''
count = [0] * 256 
for ch in a : count[ord(ch)] += 1
for ch in a :
    if(count[ord(ch)] > max):
        max = count[ord(ch)] 
        chh = ch
        
print(chh)

方法三：

   import collections
    
    line ='North Calcutta Shyambazaar Soudipta Tabu  Roopa Roopi Gina Gini Protijayi  Sovabazaar Paikpara  Baghbazaar  Roopa'
    
bb = collections.Counter(line).most_common(1)[0][0]
print(bb)

方法四：

line =' North Calcutta Shyambazaar Soudipta Tabu  Roopa Roopi Gina Gini Protijayi  Sovabazaar Paikpara  Baghbazaar  Roopa'


def mostcommonletter(sentence):
    letters = list(sentence)
    return (max(set(letters),key = letters.count))


print(mostcommonletter(line))

- Soudipta Dutta

1

def most_frequent(text):
    frequencies = [(c, text.count(c)) for c in set(text)]
    return max(frequencies, key=lambda x: x[1])[0]

s = 'ABBCCCDDDD'
print(most_frequent(s))

frequencies 是一个元组列表，用来计数字符的出现次数，格式为 (character, count)。我们使用 count 对元组进行最大值处理，并返回该元组的 character 值。如果存在多个最大值，则此解决方案只会选择其中一个。

- eerock

1

我注意到大多数答案只返回一个项目，即使最常用的字符数量相等。例如，“iii 444 yyy 999”。空格、i、4、y和9的数量相等。解决方案应该返回所有内容，而不仅仅是字母i：

sentence = "iii 444 yyy 999"

# Returns the first items value in the list of tuples (i.e) the largest number
# from Counter().most_common()
largest_count: int = Counter(sentence).most_common()[0][1]

# If the tuples value is equal to the largest value, append it to the list
most_common_list: list = [(x, y)
                         for x, y in Counter(sentence).items() if y == largest_count]

print(most_common_count)

# RETURNS
[('i', 3), (' ', 3), ('4', 3), ('y', 3), ('9', 3)]

- Chris Alderson

0

以下是我会做的几件事：

使用 collections.defaultdict 而不是手动初始化的 dict。
使用内置的排序和最大值函数，如 max，而不是自己计算 - 这样更容易。

这是我的最终结果：

from collections import defaultdict

def find_max_letter_count(word):
    matches = defaultdict(int)  # makes the default value 0

    for char in word:
        matches[char] += 1

    return max(matches.iteritems(), key=lambda x: x[1])

find_max_letter_count('helloworld') == ('l', 3)

- Chris Morgan

吹毛求疵：letters应该改为letter，因为这是一个仅包含一个字母的变量。 - Eric O. Lebigot

1

@EOL: true; 我没有改变他所使用的那个变量的名称 - 我会将其命名为 char，因为它不仅仅是一个字母... - Chris Morgan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Greg Hewgill · Accepted Answer

有很多方法可以更短地实现这个。例如，你可以使用 Counter 类（在 Python 2.7 及以上版本中）：

import collections
s = "helloworld"
print(collections.Counter(s).most_common(1)[0])

如果您没有它，可以手动进行计数（2.5或更高版本有defaultdict）：

d = collections.defaultdict(int)
for c in s:
    d[c] += 1
print(sorted(d.items(), key=lambda x: x[1], reverse=True)[0])

话虽如此，你的实现方式并没有太大问题。