edgeID (unique key) | timeStep (ordering key, | value
| can have multiple occurrences) |
-----------------------------------------------------------------
"edge1" | 15 | 12.1
"edge3" | 18 | 17.32
"edge2" | 23 | 15.1
"edge5" | 23 | 65.6
我希望能够在这个数据结构上高效地执行以下任务:
- 添加一个新的数据条目,其中
timeStep
高于存储的任何其他timeStep
。如果达到数据条目的maxNumber
(例如20),则应删除具有最低timeStep
的数据条目。 - 合并两个数据集,保留
maxNumber
数据条目(例如20)的最高timeStemp
条目,同时确保每个edgeID
最多出现一次(如果有两个条目用于一个边缘,则应使用最高的timeStep
条目)。
如何在 Python 中实现此数据结构?
我尝试了一种可行的方法:
One dict that stores the data, one SortedSet that stores the keys according to the sort key:
data = {} dataOrder = SortedSet(key=lambda x: data[x][0]) maxDataSize = 20 def addData(edgeID, dataTuple): if(len(data) >= maxDataSize): # remove oldest value key = dataOrder.pop(0) del data[key] # add data[edgeID] = dataTuple dataOrder.add(edgeID) addData("edge1", (15, 12.1))
The downside of this approach is that I store the
edgeID
twice and that I always have to update both data structures.
Only one SortedSet that stores the whole data and sorts according to the sort key:
data = SortedSet(key=lambda x: x[1]) maxDataSize = 20 def addData(dataTuple): if(len(self.data) >= self.maxDataSize): # remove oldest value data.pop(0) # add data.add(dataTuple) addData(("edge1", 15, 12.1))
The fact why this approach does not work is that it lets me enter the same
edgeID
twice with differenttimeSteps
because (I think) it hashes the whole tuple and not just theedgeID
. Unfortunately I cannot define a hash function in theOrderedSet
constructor. This leads me to my third approach that I think must work:Instead of using tuples as data entries I could define a class that implements the
__hash__()
function which would just return theedgeID
. Then I could store objects of this class in anOrderedSet