如何从列表中检索最小的唯一值？

Question

如何从列表中检索最小的唯一值？

17

我有一个字典列表。我希望每个唯一的API只有一个结果，并且结果需要按照优先级：0、1、2显示。请问我该如何处理？

数据：

[
{'api':'test1', 'result': 0},
{'api':'test2', 'result': 1},
{'api':'test3', 'result': 2},
{'api':'test3', 'result': 0},
{'api':'test3', 'result': 1},
]

期望输出：

[
{'api':'test1', 'result': 0},
{'api':'test2', 'result': 1},
{'api':'test3', 'result': 0},
]

- UnKnown

1

你能否在索引“api”处列出字典列表，使得文本中的 test1、test2等始终按顺序排列，例如 test1、test2、test3？ - Aryan

1

@Aryan Mishra，感谢您的回复！我已经尝试按照以下帖子进行排序和筛选，它有效了！感谢您的帮助。 - UnKnown

9个回答

8

你可以遍历列表并为每个组保留你看到的最佳选择。这样做既节省时间又节省空间。

def get_min_unique(items, id_key, value_key):
  lowest = {}
  for item in items:
    key = item[id_key]
    if key not in lowest or lowest[key][value_key] > item[value_key]:
        lowest[key] = item
  return list(lowest.values())

例如使用您自己的数据：

data = [
  {'api':'test1', 'result': 0},
  {'api':'test2', 'result': 1},
  {'api':'test3', 'result': 2},
  {'api':'test3', 'result': 0},
  {'api':'test3', 'result': 1},
]

assert get_min_unique(data, 'api', 'result') == [
  {'api': 'test1', 'result': 0},
  {'api': 'test2', 'result': 1},
  {'api': 'test3', 'result': 0},
]

- Cireo

7

data = [
    {'api': 'test1', 'result': 0},
    {'api': 'test3', 'result': 2},
    {'api': 'test2', 'result': 1},
    {'api': 'test3', 'result': 1},
    {'api': 'test3', 'result': 0}
]

def find(data):
    step1 = sorted(data, key=lambda k: k['result'])
    print('step1', step1)

    step2 = {}
    for each in step1:
        if each['api'] not in step2:
            step2[each['api']] = each
    print('step2', step2)

    step3 = list(step2.values())
    print('step3', step3)
    print('\n')
    return step3

find(data)

试试这个，它会给你

step1 [{'api': 'test1', 'result': 0}, {'api': 'test3', 'result': 0}, {'api': 'test2', 'result': 1}, {'api': 'test3', 'result': 1}, {'api': 'test3', 'result': 2}]
step2 {'test1': {'api': 'test1', 'result': 0}, 'test3': {'api': 'test3', 'result': 0}, 'test2': {'api': 'test2', 'result': 1}}
step3 [{'api': 'test1', 'result': 0}, {'api': 'test3', 'result': 0}, {'api': 'test2', 'result': 1}]

首先进行排序，然后为每个“api”找到第一个，这样就得到了您的结果。

- BananZ

9

你有点重新发明轮子了——看一下itertools.groupby——它做的与你的函数完全一样。 - Grzegorz Skibinski

谢谢@GrzegorzSkibinski！我也是Python的新手，日常工作仍然使用Swift/JS。一定会去查看这些库的。 - BananZ

4

探究代码高尔夫：

from itertools import groupby
dut = [
    {'api':'test1', 'result': 0},
    {'api':'test2', 'result': 1},
    {'api':'test3', 'result': 2},
    {'api':'test3', 'result': 0},
    {'api':'test3', 'result': 1},
]

res = [
    next(g)
    for _,g in groupby(
        sorted(dut, key=lambda d: tuple(d.values())),
        key=lambda i: i['api']
    )
]

结果：

Out[45]:
[{'api': 'test1', 'result': 0},
 {'api': 'test2', 'result': 1},
 {'api': 'test3', 'result': 0}]

使用 Python 标准库中的 itertools.groupby 工具，将作为第一个参数提供的可迭代对象通过 sorted 按 api 和 result 升序排序，然后仅按照 result 进行分组。 groupby 返回一个可迭代对象，其中包含该组的键以及该组中项目的可迭代对象，如下所示：

In [56]: list(groupby(sorted(dut, key=lambda i: tuple(i.values())), key=lambda i: i['api']))
Out[56]:
[('test1', <itertools._grouper at 0x10af4c550>),
 ('test2', <itertools._grouper at 0x10af4c400>),
 ('test3', <itertools._grouper at 0x10af4cc88>)]

使用列表推导式，由于组已经排序，使用next获取组中的第一项并丢弃组键。

- salparadise

3

如果您需要存储每个 API 的所有优先级并仅定期筛选出最高优先级，则现有答案可以满足需求。然而，如果您只需要每个 API 的最高优先级，我认为您正在使用错误的数据结构。

>>> from collections import UserDict
>>> 
>>> class DataContainer(UserDict):
...     def __setitem__(self, key, value):
...         cur = self.get(key)
...         if cur is None or value < cur:
...             super().__setitem__(key, value)
...     def __str__(self):
...         return '\n'.join(("'api': {}, 'result': {}".format(k, v) for k, v in self.items()))
... 
>>> data = DataContainer()
>>> data['test1'] = 0
>>> data['test2'] = 1
>>> data['test3'] = 2
>>> data['test3'] = 0
>>> data['test3'] = 1
>>> print(data)
'api': test1, 'result': 0
'api': test2, 'result': 1
'api': test3, 'result': 0

该容器只会包含每个API的最高优先级。其优点包括：

明确表达你正在做什么
无需进行代码压缩
保持内存占用最小化
比定期排序、分组和过滤更快

- Adam Acosta

你可以从dict派生，而不是从UserDict派生。 - AKX

2

这不是像其他解决方案一样干净的解决方案，但我认为这是一种逐步、易于理解的方法。

l = [
{'api':'test1', 'result': 0},
{'api':'test2', 'result': 1},
{'api':'test3', 'result': 2},
{'api':'test3', 'result': 0},
{'api':'test3', 'result': 1},
]

j = {'api':[], 'result':[]}
for i in l:
    if i['api'] not in j['api']:
        j['api'].append(i['api'])
        j['result'].append(i['result']) 
    else:    
        index = j['api'].index(i['api'])
        
        
        if j['result'][index]>i['result']:
            j['result'][index] = i['result']
        
result = []

for i in range(len(j['api'])):
        result.append({'api':j['api'][i],'result':j['result'][i]})
    
print(result)

输出

[{'api': 'test1', 'result': 0},
 {'api': 'test2', 'result': 1},
 {'api': 'test3', 'result': 0}]

- sahasrara62

2

你可以选择另一个更高效的数据结构：计数器字典。

这样可以保留每个API结果的分布，而且代码相对简单：

data = [
{'api':'test1', 'result': 0},
{'api':'test2', 'result': 1},
{'api':'test3', 'result': 2},
{'api':'test3', 'result': 0},
{'api':'test3', 'result': 1},
]

from collections import Counter

results = {}
for d in data:
    counter = results.setdefault(d['api'], Counter())
    counter[d['result']] += 1

results
# {'test1': Counter({0: 1}),
#  'test2': Counter({1: 1}),
#  'test3': Counter({2: 1, 0: 1, 1: 1})}

[{'api': api, 'result':min(v.keys())} for api, v in results.items()]
# [{'api': 'test1', 'result': 0},
#  {'api': 'test2', 'result': 1},
#  {'api': 'test3', 'result': 0}]

如果您想获取最大值或结果的计数，只需更改最后一行即可。

- Eric Duminil

1

如果您愿意使用外部库，这里有一个最干净的解决方案：

import pandas as pd
df = pd.DataFrame(data)
dfMin = df.groupby(by='api').min()

dfMin 是一个 Pandas DataFrame，其索引为 api 和 result，表示每个 API 的最小值。

- Jacob K

0

另一种解决方案...

result = {}

for d in data: result[ d['api']] = min(result.get(d['api'], d['result']), d['result'])

new_data = [ {'api' : k, 'result': v} for k, v in result.items() ]

print (new_data)

打印

#[{'api': 'test1', 'result': 0}, {'api': 'test2', 'result': 1}, {'api': 'test3', 'result': 0}]

- Dilip Majithia

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Grzegorz Skibinski · Accepted Answer

假设输入 data，您可以执行经典的类 SQL 的 groupby 操作：

from itertools import groupby

# in case your data is sorted already by api skip the below line
data = sorted(data, key=lambda x: x['api'])

res = [
    {'api': g, 'result': min(v, key=lambda x: x['result'])['result']} 
    for g, v in groupby(data, lambda x: x['api'])
]

输出:

[{'api': 'test1', 'result': 0}, {'api': 'test2', 'result': 1}, {'api': 'test3', 'result': 0}]