如何在Python中过滤字典列表？

Question

如何在Python中过滤字典列表？

3

我有一个字典列表，如下所示-

VehicleList = [
        {
            'id': '1',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 10, 16, 9, 44, 872000)
        },
        {
            'id': '2',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
        },
        {
            'id': '3',
            'VehicleType': 'Truck',
            'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
        },
        {
            'id': '4',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 10, 21, 1, 00, 300012)
        },
        {
            'id': '5',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
        }
    ]

如何根据“CreationDate”获取每个“VehicleType”的最新车辆列表？

我期望的结果类似于-

latestVehicles = [
        {
            'id': '5',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
        },
        {
            'id': '2',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
        },
        {
            'id': '3',
            'VehicleType': 'Truck',
            'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
        }
    ]

我尝试根据它们的 'VehicleType' 将每个字典分别分成不同的列表，然后选择最新的一个。

我相信可能有更优化的方法来实现这一点。

- Neutrino Watson

8个回答

2

这里提供一种使用max和filter的解决方案：

VehicleLatest = [
    max(
        filter(lambda _: _["VehicleType"] == t, VehicleList), 
        key=lambda _: _["CreationDate"]
    ) for t in {_["VehicleType"] for _ in VehicleList}
]

结果

print(VehicleLatest)
# [{'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)}, {'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}, {'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)}]

- niko

VehicleLatest必须包含id为2、3和5的车辆，但是您的解决方案提供了id为1、2和3的车辆。 - Neutrino Watson

1

@NeutrinoWatson，我打错了（在lambda函数中忘记了“_”）。已经修复。 - niko

1

我认为您可以使用itertools的groupby函数来实现您想要的功能。

from itertools import groupby

# entries sorted according to the key we wish to groupby: 'VehicleType'
VehicleList = sorted(VehicleList, key=lambda x: x["VehicleType"])

latestVehicles = []

# Then the elements are grouped.
for k, v in groupby(VehicleList, lambda x: x["VehicleType"]):
    # We then append to latestVehicles the 0th entry of the
    # grouped elements after sorting according to the 'CreationDate'
    latestVehicles.append(sorted(list(v), key=lambda x: x["CreationDate"], reverse=True)[0])

- Benjamin Rowell

1

按'VehicleType'和'CreationDate'排序，然后从'VehicleType'和车辆中创建一个字典，以获取每种类型的最新车辆：

VehicleList.sort(key=lambda x: (x.get('VehicleType'), x.get('CreationDate')))
out = list(dict(zip([item.get('VehicleType') for item in VehicleList], VehicleList)).values())

输出：

[{'id': '2',
  'VehicleType': 'Bike',
  'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
 {'id': '5',
  'VehicleType': 'Car',
  'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
 {'id': '3',
  'VehicleType': 'Truck',
  'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]

- user7864386

0

在 pandas 中，这非常简单。首先将字典列表加载为 pandas 数据帧，然后按日期排序值，取前 n 个项目（以下示例中为 3），并导出为字典。

import pandas as pd

df = pd.DataFrame(VehicleList)
df.sort_values('CreationDate', ascending=False).head(3).to_dict(orient='records')

- RJ Adriaansen

0

一个小小的请求，希望编写更易读的代码：

from operator import itemgetter
from itertools import groupby

vtkey = itemgetter('VehicleType')
cdkey = itemgetter('CreationDate')

latest = [
    # Get latest from each group.
    max(vs, key = cdkey)
    # Sort and group by VehicleType.
    for g, vs in groupby(sorted(vehicles, key = vtkey), vtkey)
]

- FMc

0

你可以使用运算符来实现这个目标：

import operator
my_sorted_list_by_type_and_date = sorted(VehicleList, key=operator.itemgetter('VehicleType', 'CreationDate'))

- DueSouth

0

使用defaultdict变体来避免冗长的if条件，参考Blckknght的答案

from collections import defaultdict
import datetime
from operator import itemgetter

latest_dict = defaultdict(lambda: {'CreationDate': datetime.datetime.min})

for vehicle in VehicleList:
    t = vehicle['VehicleType']
    latest_dict[t] = max(vehicle, latest_dict[t], key=itemgetter('CreationDate'))

latestVehicles = list(latest_dict.values())