如何在Python中过滤字典列表?

3

我有一个字典列表,如下所示-

VehicleList = [
        {
            'id': '1',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 10, 16, 9, 44, 872000)
        },
        {
            'id': '2',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
        },
        {
            'id': '3',
            'VehicleType': 'Truck',
            'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
        },
        {
            'id': '4',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 10, 21, 1, 00, 300012)
        },
        {
            'id': '5',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
        }
    ]

如何根据“CreationDate”获取每个“VehicleType”的最新车辆列表?

我期望的结果类似于-

latestVehicles = [
        {
            'id': '5',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
        },
        {
            'id': '2',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
        },
        {
            'id': '3',
            'VehicleType': 'Truck',
            'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
        }
    ]

我尝试根据它们的 'VehicleType' 将每个字典分别分成不同的列表,然后选择最新的一个。

我相信可能有更优化的方法来实现这一点。

8个回答

4
使用一个从VehicleType值映射到你最终列表中想要的字典。将输入列表中每个项目的日期与你的字典中的日期进行比较,并保留更新时间较晚的条目。
latest_dict = {}

for vehicle in VehicleList:
    t = vehicle['VehicleType']
    if t not in latest_dict or vehicle['CreationDate'] > latest_dict[t]['CreationDate']:
        latest_dict[t] = vehicle

latestVehicles = list(latest_dict.values())

并不像其他一些代码那样花哨,但简短易懂,并且具有线性时间复杂度(如果最新字典查找被视为O(1))。 - Ture Pålsson

2
这里提供一种使用maxfilter的解决方案:
VehicleLatest = [
    max(
        filter(lambda _: _["VehicleType"] == t, VehicleList), 
        key=lambda _: _["CreationDate"]
    ) for t in {_["VehicleType"] for _ in VehicleList}
]

结果

print(VehicleLatest)
# [{'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)}, {'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}, {'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)}]

VehicleLatest必须包含id为2、3和5的车辆,但是您的解决方案提供了id为1、2和3的车辆。 - Neutrino Watson
1
@NeutrinoWatson,我打错了(在lambda函数中忘记了“_”)。已经修复。 - niko

1

我认为您可以使用itertools的groupby函数来实现您想要的功能。

from itertools import groupby

# entries sorted according to the key we wish to groupby: 'VehicleType'
VehicleList = sorted(VehicleList, key=lambda x: x["VehicleType"])

latestVehicles = []

# Then the elements are grouped.
for k, v in groupby(VehicleList, lambda x: x["VehicleType"]):
    # We then append to latestVehicles the 0th entry of the
    # grouped elements after sorting according to the 'CreationDate'
    latestVehicles.append(sorted(list(v), key=lambda x: x["CreationDate"], reverse=True)[0])

1

'VehicleType''CreationDate'排序,然后从'VehicleType'和车辆中创建一个字典,以获取每种类型的最新车辆:

VehicleList.sort(key=lambda x: (x.get('VehicleType'), x.get('CreationDate')))
out = list(dict(zip([item.get('VehicleType') for item in VehicleList], VehicleList)).values())

输出:

[{'id': '2',
  'VehicleType': 'Bike',
  'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
 {'id': '5',
  'VehicleType': 'Car',
  'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
 {'id': '3',
  'VehicleType': 'Truck',
  'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]

0

pandas 中,这非常简单。首先将字典列表加载为 pandas 数据帧,然后按日期排序值,取前 n 个项目(以下示例中为 3),并导出为字典。

import pandas as pd

df = pd.DataFrame(VehicleList)
df.sort_values('CreationDate', ascending=False).head(3).to_dict(orient='records')

0
一个小小的请求,希望编写更易读的代码:
from operator import itemgetter
from itertools import groupby

vtkey = itemgetter('VehicleType')
cdkey = itemgetter('CreationDate')

latest = [
    # Get latest from each group.
    max(vs, key = cdkey)
    # Sort and group by VehicleType.
    for g, vs in groupby(sorted(vehicles, key = vtkey), vtkey)
]

0
你可以使用运算符来实现这个目标:
import operator
my_sorted_list_by_type_and_date = sorted(VehicleList, key=operator.itemgetter('VehicleType', 'CreationDate'))

0

使用defaultdict变体来避免冗长的if条件,参考Blckknght的答案

from collections import defaultdict
import datetime
from operator import itemgetter

latest_dict = defaultdict(lambda: {'CreationDate': datetime.datetime.min})

for vehicle in VehicleList:
    t = vehicle['VehicleType']
    latest_dict[t] = max(vehicle, latest_dict[t], key=itemgetter('CreationDate'))

latestVehicles = list(latest_dict.values())

最新车辆:

[{'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
 {'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
 {'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接