如何在Python中根据键值计算字典的频率？

Question

如何在Python中根据键值计算字典的频率？

pythonpython-3.xdictionaryword-countword-frequency

4

假设我有一个字典，其中包含单词和短语的形式如下。

{
    ('The brown fox',): [0], ('the race',): [0], ('Apple',): [1], 
    ('a company Apple',): [1], ('iphone',): [1], ('Paris',): [2],
    ('Delhi',): [2], ('London',): [2], ('world cities',): [2], 
    ('home',): [3, 4], ('order delivery food',): [3], ('simple voice command',): [3], 
    ('dinner',): [3], ('a long day',): [3], ('work',): [3], 
    ('teams',): [4], ('goal home',): [4], ('fox world',): [5], 
    ('a world class company',): [5], ('A geyser heating system',): [6], ('a lot',): [7], 
    ('the book Python',): [7], ('an amazing language',): [7], ('i',): [8], 
    ('a good boy',): [8], ('Team Performance',): [9], ('Revolv central automation device',): [10], 
    ('the switch way',): [11], ('play children',): [12]
}

我希望能够根据给定的关键值计算所有单词/短语的频率。

例如：仅单词“home”的频率需要为2（因为它在3和4个关键值中都出现了）。其余所有单词/短语的频率均为1。

我尝试使用

Counter(index.values()).most_common()

是否存在一种用Python实现这种计算的方法？

- M S

2个回答

1

你可以使用字典推导式来获取一个以短语为键，计数为值的字典。

d = {('The brown fox',): [0], ('the race',): [0], ('Apple',): [1], ('a company Apple',): [1], ('iphone',): [1], ('Paris',): [2], ('Delhi',): [2], ('London',): [2], ('world cities',): [2], ('home',): [3, 4], ('order delivery food',): [3], ('simple voice command',): [3], ('dinner',): [3], ('a long day',): [3], ('work',): [3], ('teams',): [4], ('goal home',): [4], ('fox world',): [5], ('a world class company',): [5], ('A geyser heating system',): [6], ('a lot',): [7], ('the book Python',): [7], ('an amazing language',): [7], ('i',): [8], ('a good boy',): [8], ('Team Performance',): [9], ('Revolv central automation device',): [10], ('the switch way',): [11], ('play children',): [12]}

frequency = {k[0]: len(v) for k, v in d.items()}

print(frequency)
# {'The brown fox': 1, 'the race': 1, 'Apple': 1, 'a company Apple': 1, 'iphone': 1, 'Paris': 1, 'Delhi': 1, 'London': 1, 'world cities': 1, 'home': 2, 'order delivery food': 1, 'simple voice command': 1, 'dinner': 1, 'a long day': 1, 'work': 1, 'teams': 1, 'goal home': 1, 'fox world': 1, 'a world class company': 1, 'A geyser heating system': 1, 'a lot': 1, 'the book Python': 1, 'an amazing language': 1, 'i': 1, 'a good boy': 1, 'Team Performance': 1, 'Revolv central automation device': 1, 'the switch way': 1, 'play children': 1}

- benvc

非常感谢，这很有帮助。 - M S

是否有可能在保持键值的情况下同时打印频率和键值？例如对于单词“home”，键值为[1,2]，频率为2。如何按排序顺序获取频率？ - M S

1

@MishraS - 在Python 3.7中，字典是有序的，因此您可以执行类似于{k: (len(v), v) for k, v in sorted(d.items(), key=lambda x: len(x[1]), reverse=True)}的操作，以返回一个包含原始键和元组（包含计数和值列表，按计数降序排序）的字典。在早期的Python版本中，您可以使用类似于OrderedDict的方法。 - benvc

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- TechPerson · Accepted Answer

Mishra。您可以尝试。

frequencies = []
for key in your_dictionary.keys():
    frequencies.append(len(your_dictionary[key]))

如果您只想将频率分别列在列表中。

或者，如果您希望能够从单词或短语中获取频率：

frequency_from_phrase = {}
for key in your_dictionary.keys():
    frequency_from_phrase[key] = len(your_dictionary[key])