Python:使用最大堆和最小堆查找运行中位数

8
我正在尝试返回一系列流数据的运行中位数。为此,我使用最大堆(存储系列的下半部分的值)和最小堆(存储系列的上半部分的值)。
特别地,我使用Python(2.0)内置的最小堆数据结构来自heapq模块(https://docs.python.org/2/library/heapq.html)。要构建最大堆,我只需使用需要推入堆中的数字的负数即可。
我的Python代码如下:
import heapq

maxh = []
minh = []
vals=[1,2,3,4,5,6,7,8,9,10]
for val in vals:

    # Initialize the data-structure and insert/push the 1st streaming value
    if not maxh and not minh:
        heapq.heappush(maxh,-val)
        print float(val)
    elif maxh:

        # Insert/push the other streaming values
        if val>-maxh[0]:
            heapq.heappush(minh,val)
        elif val<-maxh[0]:
            heapq.heappush(maxh,-val)

        # Calculate the median
        if len(maxh)==len(minh):
            print float(-maxh[0]+minh[0])/2
        elif len(maxh)==len(minh)+1:
            print float(-maxh[0])
        elif len(minh)==len(maxh)+1:
            print float(minh[0])

        # If min-heap and max-heap grow unbalanced we rebalance them by
        # removing/popping one element from a heap and inserting/pushing
        # it into the other heap, then we calculate the median
        elif len(minh)==len(maxh)+2:
            heapq.heappush(maxh,-heapq.heappop(minh))
            print float(-maxh[0]+minh[0])/2
        elif len(maxh)==len(minh)+2:
            heapq.heappush(minh,-heapq.heappop(maxh))
            print float(-maxh[0]+minh[0])/2

以下是我建立的完整测试用例列表,用于检查我的代码:
vals=[1,2,3,4,5,6,7,8,9,10] # positive numbers, increasing series
vals=[10,9,8,7,6,5,4,3,2,1] # positive numbers, decreasing series
vals=[10,9,11,8,12,7,13,6,14,5] # positive numbers, jumping series (keeping
                                # heaps balanced)

vals=[-10,-9,-8,-7,-6,-5,-4,-3,-2,-1] # negative numbers, increasing series
vals=[-1,-2,-3,-4,-5,-6,-7,-8,-9,-10] # negative numbers, decreasing series
vals=[-10,-9,-11,-8,-12,-7,-13,-6,-14,-5] # negative numbers
                                          # jumping series (keeping heaps
                                          # balanced)

vals=[-5,-4,-3,-2,-1,0,1,2,3,4,5] # mixed positive-negative numbers,
                                  # increasing series
vals=[5,4,3,2,1,0,-1,-2,-3,-4,-5] # mixed positive-negative numbers,
                                  # decreasing series
vals=[0,-1,1,-2,2,-3,3,-4,4,-5,5] # mixed positive-negative numbers,
                                  # jumping series (keeping heaps balanced)

我的代码看起来没问题,但我无法通过在线评测系统的10个测试用例中的4个(https://www.hackerrank.com/challenges/ctci-find-the-running-median/problem)。

你有什么提示吗?


那个问题说明第一个数字表示将输入多少个值。你有考虑到这一点吗? - Christopher Bottoms
1
如果存在重复值,您的代码可能会失败。如果下一个项等于目前在 maxh 顶部的值。 - Jim Mischel
1
你可能不需要那些信息来找到解决方案,但如果在提交代码进行评估时没有直接控制输入,它可能会成为失败的原因。但听起来@JimMischel已经发现了更重要的问题需要你担心。 - Christopher Bottoms
1
不知怎么地,我截断了我的注释。如果下一个项目等于目前在 maxh 顶部的值,它将不会被添加到任何堆中。测试用例 [1,1,2] 应该揭示了这个错误。 - Jim Mischel
太好了!感谢@JimMischel的提示,我有点傻不愣登地没想到:)。我添加了块“elif val==-maxh[0]: heapq.heappush(minh,val)”,现在我已经通过了所有测试用例! - SergeGardien
显示剩余2条评论
1个回答

8
问题出在这里:
    # Insert/push the other streaming values
    if val>-maxh[0]:
        heapq.heappush(minh,val)
    elif val<-maxh[0]:
        heapq.heappush(maxh,-val)

如果val == maxh[0],那么该项将不会被推入堆栈。你可以通过测试用例[1,1,2]来揭示错误。
一个简单的修复方法是:
    # Insert/push the other streaming values
    if val >= -maxh[0]:
        heapq.heappush(minh,val)
    else
        heapq.heappush(maxh,-val)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接