从用户的输入中确定最常见的单词。【Python】

Question

从用户的输入中确定最常见的单词。【Python】

3

我尝试解决这个问题的方法是将用户输入的单词存入列表中，然后使用 .count() 方法来查看该单词在列表中出现的次数。但是问题在于当存在并列时，我需要打印出所有出现次数最多的单词。如果我使用的单词在另一个只出现相同次数的单词中，它就不能工作了。例如：如果我按照顺序使用“Jimmy”和“Jim”，它只会打印出“Jimmy”。

for value in usrinput:
        dict.append(value)
    for val in range(len(dict)):
        count = dict.count(dict[val])
        print(dict[val],count)

        if (count > max):
            max = count
            common= dict[val]
        elif(count == max):
            if(dict[val] in common):
                pass
            else:
                common+= "| " + dict[val]

- Harry Harry

"['Jimmy', 'Jim']" 应该算作 Jimmy 1 次，Jim 2 次吗？ - dansalmo

每个应该计数1。 - Harry Harry

2

离题：不要使用 max 作为变量名，它会掩盖内置函数 max()。 - Ashwini Chaudhary

3

更重要的是：不要将 dict 用作变量名，尤其是对于不是 Python 字典的东西！ - user2357112

4个回答

1

为什么不使用 collections.defaultdict？

from collections import defaultdict

d = defaultdict(int)
for value in usrinput:
    d[value] += 1

按出现次数降序排列最常见的单词：

print sorted(d.items(), key=lambda x: x[1])[::-1]

- wflynny

如果您使用的版本低于Python 2.7，则不行。 - wflynny

1

从OP的代码来看，很明显他正在使用py3.x。顺便提一句，默认字典在<py2.5版本中不可用。 - Ashwini Chaudhary

从OP的代码来看，他将“dict”作为变量名称用于一个甚至不是字典的东西，因此我们无法确定他是否只是在打印元组。 - user2357112

1

这是一种快速而不太优雅的解决方案，使用了NumPy。请注意，保留了HTML标签。

import numpy as np

def print_common( usrinput ):
    '''prints the most common entry of usrinput, printing all entries if there is a tie '''
    usrinput = np.array( usrinput )
    # np.unique returns the unique elements of usrinput
    unique_inputs = np.unique( usrinput )
    # an array to store the counts of each input
    counts = np.array( [] )
    # loop over the unique inputs and store the count for each item
    for u in unique_inputs:
        ind = np.where( usrinput == u )
        counts = np.append( counts, len( usrinput[ ind ] ) )
    # find the maximum counts and indices in the original input array
    max_counts = np.max( counts )
    max_ind    = np.where( counts == max_counts )
    # if there's a tie for most common, print all of the ties
    if len( max_ind[0] ) > 1:
        for i in max_ind[0]:
            print unique_inputs[i], counts[i]
    #otherwise just print the maximum
    else:
        print unique_inputs[max_ind][0], counts[max_ind][0]

    return 1

# two test arrays which show desired results
usrinput = ['Jim','Jim','Jim', 'Jimmy','Jimmy','Matt','Matt','Matt']
print_common( usrinput )

usrinput = ['Jim','Jim','Jim', 'Jimmy','Jimmy','Matt','Matt']
print_common( usrinput )

- Brian Hayden

1

最好不要将字符串 "Jim" in "Fred|Jimmy|etc" 与常见的连接起来，而是使用列表存储找到的最大值，并打印 "|".join(commonlist)。

- Steve Barnes

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sukrit Kalra · Accepted Answer

使用 collections.Counter 类。我会给你一个提示。

>>> from collections import Counter
>>> a = Counter()
>>> a['word'] += 1
>>> a['word'] += 1
>>> a['test'] += 1
>>> a.most_common()
[('word', 2), ('test', 1)]

您可以从这里提取单词和频率。使用它从用户输入中提取频率。

>>> userInput = raw_input("Enter Something: ")
Enter Something: abc def ghi abc abc abc ghi
>>> testDict = Counter(userInput.split(" "))
>>> testDict.most_common()
[('abc', 4), ('ghi', 2), ('def', 1)]