具有不同哈希和排序键的元组有序集合

Question

具有不同哈希和排序键的元组有序集合

4

我有以下数据结构（带有示例数据）：

edgeID (unique key) | timeStep (ordering key,            | value
                    |     can have multiple occurrences) | 
-----------------------------------------------------------------
"edge1"             | 15                                 | 12.1
"edge3"             | 18                                 | 17.32
"edge2"             | 23                                 | 15.1
"edge5"             | 23                                 | 65.6

我希望能够在这个数据结构上高效地执行以下任务：

添加一个新的数据条目，其中 timeStep 高于存储的任何其他 timeStep。如果达到数据条目的 maxNumber（例如20），则应删除具有最低 timeStep 的数据条目。
合并两个数据集，保留 maxNumber 数据条目（例如20）的最高 timeStemp 条目，同时确保每个 edgeID 最多出现一次（如果有两个条目用于一个边缘，则应使用最高的 timeStep 条目）。

如何在 Python 中实现此数据结构？

我尝试了一种可行的方法：

One dict that stores the data, one SortedSet that stores the keys according to the sort key:

data = {}
dataOrder = SortedSet(key=lambda x: data[x][0])
maxDataSize = 20

def addData(edgeID, dataTuple):
    if(len(data) >= maxDataSize):
        # remove oldest value
        key = dataOrder.pop(0)
        del data[key]
    # add
    data[edgeID] = dataTuple
    dataOrder.add(edgeID)

addData("edge1", (15, 12.1))

The downside of this approach is that I store the edgeID twice and that I always have to update both data structures.

我尝试了一种行不通的方法：

Only one SortedSet that stores the whole data and sorts according to the sort key:
```
data = SortedSet(key=lambda x: x[1])
maxDataSize = 20

def addData(dataTuple):
    if(len(self.data) >= self.maxDataSize):
        # remove oldest value
        data.pop(0)
    # add
    data.add(dataTuple)

addData(("edge1", 15, 12.1))
```
The fact why this approach does not work is that it lets me enter the same edgeID twice with different timeSteps because (I think) it hashes the whole tuple and not just the edgeID. Unfortunately I cannot define a hash function in the OrderedSet constructor. This leads me to my third approach that I think must work:
Instead of using tuples as data entries I could define a class that implements the __hash__() function which would just return the edgeID. Then I could store objects of this class in an OrderedSet

这种第三种方法真的是最好的吗？你有什么建议吗？

- MoRe

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- labheshr · Answer 1

你需要的是一个按时间步长排序的heapq。

请参考：https://docs.python.org/2/library/heapq.html 实际上，Python的堆是一个最小堆，因此最小的时间步长将存储在堆的顶部，并且可以在O(1)的时间内获取。每次在将元素插入堆之前，请检查它是否有20个或更多的条目...如果有>= 20个条目，则从堆中删除最小时间戳的条目...

您可以与另一个字典协调，以便根据您喜欢的特定键更快地获取其他剩余条目。