在Python中，使用二分查找在字典列表中查找项目

Question

在Python中，使用二分查找在字典列表中查找项目

15

我有一个包含字典的列表，类似于这样：

test_data = [
    { 'offset':0, 'data':1500 },
    { 'offset':1270, 'data':120 },
    { 'offset':2117, 'data':30 },
    { 'offset':4055, 'data':30000 },
]

字典项目根据'offset'数据在列表中排序。实际数据可能会更长。

我想做的是，在给定特定偏移量值的情况下查找列表中的项，该值不是正好是这些值之一，但在该范围内。因此，二分搜索是我想要做的。

我现在了解了Pythonbisect模块，它是一个现成的二分搜索 - 很棒，但不直接适用于这种情况。我只是想知道最简单的方法是将bisect适应我的需求。这是我想到的：

import bisect

class dict_list_index_get_member(object):
    def __init__(self, dict_list, member):
        self.dict_list = dict_list
        self.member = member
    def __getitem__(self, index):
        return self.dict_list[index][self.member]
    def __len__(self):
        return self.dict_list.__len__()

test_data_index_get_offset = dict_list_index_get_member(test_data, 'offset')
print bisect.bisect(test_data_index_get_offset, 1900)

它打印出：

我的问题是，这是否是我想要做的最佳方式，还是有其他更简单、更好的方法？

- Craig McQueen

7个回答

5

当您说实际数据可能更长时，这是否会阻止您手头保留偏移值列表？

offset_values = [i['offset'] for i in test_data]
bisect.bisect(offset_values, 1900)

您的方法对我来说看起来很不错。

- sykora

4

你可以做的是这样的。

class OffsetWithAttributes( object ):
    def __init__( self, offset, **kw ):
        self.offset= offset
        self.attributes= kw
    def __eq__( self, other ):
        return self.offset == other.offset
    def __lt__( self, other ):
        return self.offset < other.offset
    def __le__( self, other ):
        return self.offset <= other.offset
    def __gt__( self, other ):
        return self.offset > other.offset
    def __ge__( self, other ):
        return self.offset >= other.offset
    def __ne__( self, other ):
        return self.offset != other.offset

这将允许您创建一个简单的OffsetWithAttributes实例列表。 bisect算法应该可以很好地使用定义的运算符。

您可以使用someOWA.attributes['data']。

或者

    def __getattr__( self, key ):
        return self.attributes[key]

这将使OffsetWithAttributes更像一个dict。

- S.Lott

4

这里的常规模式与按属性排序相似，即装饰、操作和取消装饰。因此，在这种情况下，您只需要进行装饰并调用即可。但是，您应该避免这样做，因为装饰将是O(n)，而您希望这是O(logn)。因此，我认为您的方法是最佳的。

- Alex Gaynor

1

从Python 3.10开始，您可以将一个关键函数作为关键字参数传递给bisect函数。

>>> bisect.bisect(test_data, 1900, key=lambda x: x["offset"])
2

- Will Ayd

0

对于字典列表的范围查询，Ducks表现良好。它像二分查找一样快，因为它构建了一个基于树的索引。

pip install ducks

from ducks import Dex

test_data = [
    { 'offset':0, 'data':1500 },
    { 'offset':1270, 'data':120 },
    { 'offset':2117, 'data':30 },
    { 'offset':4055, 'data':30000 },
]

# build index on 'offset'
dex = Dex(test_data, ['offset'])

dex[{'offset': {'>': 1900}}] 
# result: [{'offset': 2117, 'data': 30}, {'offset': 4055, 'data': 30000}]

鸭子也可以通过多个属性进行搜索，例如：

# build a Dex on 'offset' and 'data'
dex = Dex(test_data, ['offset', 'data'])
dex[{'offset': {'>': 1900}, 'data': {'<': 50}}]
# result: [{'offset': 2117, 'data': 30}]

- manimino

0

如果你可以使用元组，那么它们可以与bisect一起使用。

import bisect

offset = 0
data = 1
test_data = [
    (0, 1500),
    (1270, 120),
    (2117, 30),
    (4055, 30000),
]

i = bisect.bisect(test_data, (1900,0))
test_data.insert(i, (1900,0))
print(test_data[i][data])

尽管元组的比较是按照字典顺序（从左到右）进行的，直到有一个元素不等于另一个元素为止，但您需要考虑这是否是期望的行为。

>>> bisect.insort(test_data, (2117,29))
>>> print(test_data)
[(0, 1500), (1270, 120), (2117, 29), (2117, 30), (4055, 30000)]

- uosjead

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- GrantJ · Accepted Answer

您也可以使用Python的许多SortedDict实现来管理您的test_data。排序字典按键排序并维护到值的映射。一些实现还支持对键进行bisect操作。例如，Python sortedcontainers模块有一个SortedDict符合您的要求。

在您的情况下，它将如下所示：

from sortedcontainers import SortedDict
offset_map = SortedDict((item['offset'], item['data']) for item in test_data)
index = offset_map.bisect(1275)
key = offset_map.iloc[index]
print offset_map[key]
# 120

SortedDict类型有一个bisect函数，它返回所需键的二分索引。通过该索引，您可以查找实际键。有了这个键，您就可以获取值。

在sortedcontainers中，所有这些操作都非常快，而且还方便地使用纯Python实现。性能比较也讨论了其他选择并提供了基准数据。