Python - 根据相同的键在字典列表中对值求和

6

我有一个类似于字典列表的数据结构:

data = [{'stat3': '5', 'stat2': '4', 'player': '1'}, 
        {'stat3': '8', 'stat2': '1', 'player': '1'}, 
        {'stat3': '6', 'stat2': '1', 'player': '3'}, 
        {'stat3': '3', 'stat2': '7', 'player': '3'}]

我希望获得一个嵌套字典,其中的键是来自键('player')的值,其值是聚合统计数据的字典。

输出结果应为:

{'3': {'stat3': 9, 'stat2': 8, 'player': '3'}, 
 '1': {'stat3': 13, 'stat2': 5, 'player': '1'}}

以下是我的代码:
from collections import defaultdict
result = {}
total_stat = defaultdict(int)

for dict in data:
    total_stat[dict['player']] += int(dict['stat3'])  
    total_stat[dict['player']] += int(dict['stat2']) 
total_stat = ([{'player': info, 'stat3': total_stat[info],
                'stat2': total_stat[info]} for info in 
                 sorted(total_stat, reverse=True)])
for item in total_stat:       
    result.update({item['player']: item})
print(result)

然而,我得到了这个:
{'3': {'player': '3', 'stat3': 17, 'stat2': 17}, 
 '1': {'player': '1', 'stat3': 18, 'stat2': 18}}

我该如何做才能正确?还有其他方法吗?

顺便提一下,看起来你想要一个namedtuple来存储数据,而不是字典。 - Elazar
'stat3': total_stat[info], 'stat2': total_stat[info]' - 当然是相同的值 - Elazar
7个回答

11

你的数据是一个DataFrame,自然而然的 pandas 解决方法是:

In [34]: pd.DataFrame.from_records(data).astype(int).groupby('player').sum().T.to_dict()

Out[34]: {1: {'stat2': 5, 'stat3': 13}, 3: {'stat2': 8, 'stat3': 9}}

你可以稍微整理一下。astype(int)applymap 更快(而且更易读),自版本0.17.0以来,有一个 orient='index' 可以指定输出格式。所以 pd.DataFrame.from_records(data).astype(int).groupby('player').sum().to_dict(orient='index') - miradulo

4

只需使用更嵌套的默认工厂:

>>> total_stat = defaultdict(lambda : defaultdict(int))
>>> value_fields = 'stat2', 'stat3'
>>> for datum in data:
...     player_data = total_stat[datum['player']]
...     for k in value_fields:
...         player_data[k] += int(datum[k])
...
>>> from pprint import pprint
>>> pprint(total_stat)
defaultdict(<function <lambda> at 0x1023490d0>,
            {'1': defaultdict(<class 'int'>, {'stat2': 5, 'stat3': 13}),
             '3': defaultdict(<class 'int'>, {'stat2': 8, 'stat3': 9})})

2

这个解决方案使用了嵌套字典。其中out是一个{player: Counter}的字典,而Counter本身又是另一个{stat: score}的字典。

import collections

def split_player_stat(dict_object):
    """
    Split a row of data into player, stat

    >>> split_player_stat({'stat3': '5', 'stat2': '4', 'player': '1'})
    '1', {'stat3': 5, 'stat2': 4}
    """
    key = dict_object['player']
    value = {k: int(v) for k, v in dict_object.items() if k != 'player'}
    return key, value

data = [{'stat3': '5', 'stat2': '4', 'player': '1'},
        {'stat3': '8', 'stat2': '1', 'player': '1'},
        {'stat3': '6', 'stat2': '1', 'player': '3'},
        {'stat3': '3', 'stat2': '7', 'player': '3'}]

out = collections.defaultdict(collections.Counter)
for player_stat in data:
    player, stat = split_player_stat(player_stat)
    out[player].update(stat)
print(out)

这个解决方案的神奇之处在于使用了 collections.defaultdictcollections.Counter 类,它们都表现得像字典。

1

这里大部分的解决方案都把问题复杂化了。我们来简化它,让它更易读。以下是简化后的内容:

In [26]: result = {}

In [27]: req_key = 'player'

In [29]: for dct in data:
    ...:     player_val = dct.pop(req_key)
    ...:     result.setdefault(player_val, {req_key: player_val})
    ...:     for k, v in dct.items():
    ...:         result[player_val][k] = result[player_val].get(k, 0) + int(v)

In [30]: result
Out[30]:
{'1': {'player': '1', 'stat2': 5, 'stat3': 13},
 '3': {'player': '3', 'stat2': 8, 'stat3': 9}}

以下是简单明了的程序。对于这个简单的问题,不需要导入任何内容。现在来看程序:

result.setdefault(player_val, {'player': player_val})

如果结果中没有这个键,它会将默认值设置为"player": 3"player": 1

result[player_val][k] = result[player_val].get(k, 0) + int(v)

这将累加具有相同值的键的值。

1

这不是最好的代码,也不是最pythonic的,但我认为你应该能够仔细阅读并找出你的代码哪里出了问题。

def sum_stats_by_player(data):
    result = {}

    for dictionary in data:
        print(f"evaluating dictionary {dictionary}")

        player = dictionary["player"]
        stat3 = int(dictionary["stat3"])
        stat2 = int(dictionary["stat2"])

        # if the player isn't in our result
        if player not in result:
            print(f"\tfirst time player {player}")
            result[player] = {}  # add the player as an empty dictionary
            result[player]["player"] = player

        if "stat3" not in result[player]:
            print(f"\tfirst time stat3 {stat3}")
            result[player]["stat3"] = stat3
        else:
            print(f"\tupdating stat3 { result[player]['stat3'] + stat3}")
            result[player]["stat3"] += stat3

        if "stat2" not in result[player]:
            print(f"\tfirst time stat2 {stat2}")
            result[player]["stat2"] = stat2
        else:
            print(f"\tupdating stat2 { result[player]['stat2'] + stat2}")
            result[player]["stat2"] += stat2

    return result


data = [{'stat3': '5', 'stat2': '4', 'player': '1'},
        {'stat3': '8', 'stat2': '1', 'player': '1'},
        {'stat3': '6', 'stat2': '1', 'player': '3'},
        {'stat3': '3', 'stat2': '7', 'player': '3'}]

print(sum_stats_by_player(data))

0

另一种使用计数器的版本

import itertools
from collections import Counter

def count_group(group):
    c = Counter()
    for g in group:
        g_i = dict([(k, int(v)) for k, v in g.items() if k != 'player'])
        c.update(g_i)
    return dict(c)

sorted_data = sorted(data, key=lambda x:x['player'])
results = [(k, count_group(g)) for k, g in itertools.groupby(sorted_data, lambda x: x['player'])]

print(results)

给予

[('1', {'stat3': 13, 'stat2': 5}), ('3', {'stat3': 9, 'stat2': 8})]

1
注意:为使 groupby 生效,需要按 subdict['player']data 列表进行排序。 - juanpa.arrivillaga

0

使用两个循环可以实现以下目标:

  1. 按照主键对数据进行分组
  2. 聚合所有次要信息

这两个任务在下面展示的aggregate_statistics函数中完成。

from collections import Counter
from pprint import pprint


def main():
    data = [{'player': 1, 'stat2': 4, 'stat3': 5},
            {'player': 1, 'stat2': 1, 'stat3': 8},
            {'player': 3, 'stat2': 1, 'stat3': 6},
            {'player': 3, 'stat2': 7, 'stat3': 3}]
    new_data = aggregate_statistics(data, 'player')
    pprint(new_data)


def aggregate_statistics(table, key):
    records_by_key = {}
    for record in table:
        data = record.copy()
        records_by_key.setdefault(data.pop(key), []).append(Counter(data))
    new_data = []
    for second_key, value in records_by_key.items():
        start, *remaining = value
        for record in remaining:
            start.update(record)
        new_data.append(dict(start, **{key: second_key}))
    return new_data


if __name__ == '__main__':
    main()

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接