使用公共键值对求一个字典列表的列表中的值之和

10

如何对包含字典的列表中重复元素进行求和?

示例列表:

data = [
        [
            {'user': 1, 'rating': 0},
            {'user': 2, 'rating': 10},
            {'user': 1, 'rating': 20},
            {'user': 3, 'rating': 10}
        ],
        [
            {'user': 4, 'rating': 4},
            {'user': 2, 'rating': 80},
            {'user': 1, 'rating': 20},
            {'user': 1, 'rating': 10}
        ],
    ]

预期输出:

op = [
        [
            {'user': 1, 'rating': 20},
            {'user': 2, 'rating': 10},
            {'user': 3, 'rating': 10}
        ],
        [
            {'user': 4, 'rating': 4},
            {'user': 2, 'rating': 80},
            {'user': 1, 'rating': 30},
        ],
    ]

这个对你有帮助吗?https://stackoverflow.com/questions/14925624/python-2-7-sum-value-on-duplicates-in-dictionary - Nico Müller
8个回答

5

使用 pandas 库:

>>> import pandas as pd
>>> [pd.DataFrame(dicts).groupby('user', as_index=False, sort=False).sum().to_dict(orient='records') for dicts in data]
[[{'user': 1, 'rating': 20},
  {'user': 2, 'rating': 10},
  {'user': 3, 'rating': 10}],
 [{'user': 4, 'rating': 4},
  {'user': 2, 'rating': 80},
  {'user': 1, 'rating': 30}]]

可以不使用panda实现吗? - rahul.m
@c.grey 当然可以。我只是觉得可能不够简洁。 - timgeb
请检查我的答案。 - mangupt
@timgeb 这是什么? - rahul.m
1
@c.grey 我相信会有其他非pandas的答案。我的回答是针对pandas用户的。 - timgeb

4

您可以尝试以下方法:

from itertools import groupby

result = []
for lst in data:
    sublist = sorted(lst, key=lambda d: d['user'])
    grouped = groupby(sublist, key=lambda d: d['user'])
    result.append([
        {'user': name, 'rating': sum([d['rating'] for d in group])}
        for name, group in grouped])

# Sort the `result` `rating` wise:
result = [sorted(sub, key=lambda d: d['rating']) for sub in result]

# %%timeit
# 7.54 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

更新(更高效的解决方案):
result = []
for lst in data:
    visited = {}
    for d in lst:
        if d['user'] in  visited:
            visited[d['user']]['rating'] += d['rating'] 
        else:
            visited[d['user']] = d

    result.append(sorted(visited.values(), key=lambda d: d['rating']))

# %% timeit
# 2.5 µs ± 54 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

结果:

# print(result)
[
    [
        {'user': 2, 'rating': 10},
        {'user': 3, 'rating': 10},
        {'user': 1, 'rating': 20}
    ],
    [
        {'user': 4, 'rating': 4},
        {'user': 1, 'rating': 30},
        {'user': 2, 'rating': 80}
    ]
]

你的意思是按评分方式对子列表进行排序吗? - Shubham Sharma
在这里指出一个小问题,sorted(..)引入了额外的复杂性开销,因为这正是groupby(..)在这种情况下所需要的。可以通过针对一个键存储中间结果来消除这种情况。如果这些列表很小,那么原帖作者不应该看到太大的差异。 - UltraInstinct
@UltraInstinct同意,但如果数据大小不是非常大,这应该不会是一个大问题。 - Shubham Sharma

2
op = []
for lst in data:
    rating_of_user = {}
    for e in lst:
        user, rating = e['user'], e['rating']
        rating_of_user[user] = rating_of_user.get(user, 0) + rating
    op.append([{'user': u, 'rating': r} for u, r in rating_of_user.items()])


注意:自Python 3.7起,字典正式保留插入顺序。

1
这应该可以工作:

from collections import defaultdict

data_without_duplicates = []
for l in data:
    users_ratings = defaultdict(int)
    for d in l:
        users_ratings[d["user"]] += d["rating"]
    data_without_duplicates.append(
        [{"user": user, "rating": rating} for user, rating in users_ratings.items()]
    )

0
import pprint
data = [
        [
            {'user': 1, 'rating': 0},
            {'user': 2, 'rating': 10},
            {'user': 1, 'rating': 20},
            {'user': 3, 'rating': 10}
        ],
        [
            {'user': 4, 'rating': 4},
            {'user': 2, 'rating': 80},
            {'user': 1, 'rating': 20},
            {'user': 1, 'rating': 10}
        ],
    ]

def find(user, l):
    for i, d in enumerate(l):
        if user == d['user']:
            return i
    return -1

data_sum = []

for l in data:
    list_sum = []
    for d in l:
        idx = find(d['user'], list_sum)
        if idx == -1:
            list_sum.append(d)
        else:
            list_sum[idx]['rating'] += d['rating']
    data_sum.append(list_sum)

pprint.pprint(data_sum)

0
data = [
        [
            {'user': 1, 'rating': 0},
            {'user': 2, 'rating': 10},
            {'user': 1, 'rating': 20},
            {'user': 3, 'rating': 10}
        ],
        [
            {'user': 4, 'rating': 4},
            {'user': 2, 'rating': 80},
            {'user': 1, 'rating': 20},
            {'user': 1, 'rating': 10}
        ],
    ]


keyname = "user"

all = []
for row in data:
    row_out = []
    for d in row:
        key = d[keyname]
        for d2 in row_out:
            if d2[keyname] == d[keyname]:
                break
        else:
            d2 = {keyname: key}
            row_out.append(d2)
        for k, v in d.items():
            if k == keyname:
                continue
            d2[k] = d2.get(k, 0) + v
    all.append(row_out)

print(all)

给出:

[[{'user': 1, 'rating': 20}, {'user': 2, 'rating': 10}, {'user': 3, 'rating': 10}], [{'user': 4, 'rating': 4}, {'user': 2, 'rating': 80}, {'user': 1, 'rating': 30}]]

0

应该避免排序,因为每个项目可以在单个传递中处理。任何基于哈希的技术都应该更好。

这里有一个替代方案,使用defaultdict而不是昂贵的sort/groupby或pandas。

from collections import defaultdict
from functools import reduce

def reduce_func(state, item):
  new_obj = {
    "user": item["user"],
    "rating": state[item["user"]]["rating"] + item["rating"]}
  }
  state[item["user"]] = new_obj
  return state

output = [list(reduce(reduce_func, elem, defaultdict(lambda: {"rating": 0})).values())
          for elem in data]

0
Python列表推导式:
from collections import Counter
x = [[
        {'user': x[0], 'rating': x[1]} for x in
        Counter({d['user']: d['rating'] for d in group}).most_common()] for group in data
]

输出:

[
    [
        {
            "rating": 20, 
            "user": 1
        }, 
        {
            "rating": 10, 
            "user": 2
        }, 
        {
            "rating": 10, 
            "user": 3
        }
    ], 
    [
        {
            "rating": 80, 
            "user": 2
        }, 
        {
            "rating": 10, 
            "user": 1
        }, 
        {
            "rating": 4, 
            "user": 4
        }
    ]
]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接