我在Python中使用什么来实现最大堆?

436

Python中包含heapq模块用于min-heaps,但我需要一个max-heap。在Python中应该使用什么来实现max-heap?

19个回答

4
扩展int类并覆盖__lt__是其中一种方法。
import queue
class MyInt(int):
    def __lt__(self, other):
        return self > other

def main():
    q = queue.PriorityQueue()
    q.put(MyInt(10))
    q.put(MyInt(5))
    q.put(MyInt(1))
    while not q.empty():
        print (q.get())


if __name__ == "__main__":
    main()

2
可以实现,但我觉得这会大大减慢速度并使用大量额外的内存。MyInt 也不能在堆结构之外使用。但是感谢您编写示例,很有趣。 - Leo Ufimtsev
1
哈!就在我发表这条评论的一天后,我遇到了需要将自定义对象放入堆中并需要一个最大堆的情况。我实际上重新搜索了这篇文章,并找到了您的答案,并以此为基础解决了问题。(自定义对象是具有x,y坐标和__lt__重写比较与中心距离的点)。谢谢您发表这篇文章,我已经点赞了! - Leo Ufimtsev

2
允许您选择任意数量的最大或最小项目。
import heapq
heap = [23, 7, -4, 18, 23, 42, 37, 2, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
heapq.heapify(heap)
print(heapq.nlargest(3, heap))  # [42, 42, 37]
print(heapq.nsmallest(3, heap)) # [-4, -4, 2]

3
需要解释一下。 - Peter Mortensen
2
我的回答比问题更长。您想要添加什么解释? - jasonleonhard
1
https://wikipedia.org/wiki/Min-max_heap 和 https://docs.python.org/3.0/library/heapq.html 可能也会有所帮助,这是关于编程方面的内容。 - jasonleonhard
8
这样可以得到正确的结果,但实际上并没有使用堆来提高效率。文档规定nlargest和nsmallest每次都对列表进行排序。 - RossFabricant
看起来你在这个问题上花了很多时间,编辑了几个答案,发表了许多评论,我用 ctrl-f 搜索,今天你的名字出现了 17 次。下次不要只是抱怨我的输入(还有其他人),考虑添加一些东西到我的答案中。这样做可能需要更少的时间,并且会使每个人都受益。 - jasonleonhard
显示剩余2条评论

1
我已经创建了一个堆包装器,它倒转值以创建一个最大堆,以及一个最小堆的包装类,使库更像面向对象编程。这里是要点。有三个类:Heap(抽象类)、HeapMin和HeapMax。
方法:
isempty() -> bool; obvious
getroot() -> int; returns min/max
push() -> None; equivalent to heapq.heappush
pop() -> int; equivalent to heapq.heappop
view_min()/view_max() -> int; alias for getroot()
pushpop() -> int; equivalent to heapq.pushpop

1

heapq模块拥有实现最大堆所需的一切功能。 它只执行最大堆的heappush功能。 我在下面演示了如何克服这个问题。

将此函数添加到heapq模块中:

def _heappush_max(heap, item):
    """Push item onto heap, maintaining the heap invariant."""
    heap.append(item)
    _siftdown_max(heap, 0, len(heap)-1)

最后,加入这个:
try:
    from _heapq import _heappush_max
except ImportError:
    pass

完成了!

附言 - 要进入heapq函数。首先在您的编辑器中编写“import heapq”,然后右键单击“heapq”并选择“转到定义”。


关于 *"右键点击 'heapq' 并选择转到定义"*:您使用的是哪个编辑器? - Peter Mortensen

1

关于{{link1:Apoorv Patne的回答}},这里提供了通用情况下的完整文档化、注释和经过测试的Python 3实现。

from __future__ import annotations  # To allow "MinHeap.push -> MinHeap:"
from typing import Generic, List, Optional, TypeVar
from heapq import heapify, heappop, heappush, heapreplace


T = TypeVar('T')


class MinHeap(Generic[T]):
    '''
    MinHeap provides a nicer API around heapq's functionality.
    As it is a minimum heap, the first element of the heap is always the
    smallest.
    >>> h = MinHeap([3, 1, 4, 2])
    >>> h[0]
    1
    >>> h.peek()
    1
    >>> h.push(5)  # N.B.: the array isn't always fully sorted.
    [1, 2, 4, 3, 5]
    >>> h.pop()
    1
    >>> h.pop()
    2
    >>> h.pop()
    3
    >>> h.push(3).push(2)
    [2, 3, 4, 5]
    >>> h.replace(1)
    2
    >>> h
    [1, 3, 4, 5]
    '''
    def __init__(self, array: Optional[List[T]] = None):
        if array is None:
            array = []
        heapify(array)
        self.h = array
    def push(self, x: T) -> MinHeap:
        heappush(self.h, x)
        return self  # To allow chaining operations.
    def peek(self) -> T:
        return self.h[0]
    def pop(self) -> T:
        return heappop(self.h)
    def replace(self, x: T) -> T:
        return heapreplace(self.h, x)
    def __getitem__(self, i) -> T:
        return self.h[i]
    def __len__(self) -> int:
        return len(self.h)
    def __str__(self) -> str:
        return str(self.h)
    def __repr__(self) -> str:
        return str(self.h)


class Reverse(Generic[T]):
    '''
    Wrap around the provided object, reversing the comparison operators.
    >>> 1 < 2
    True
    >>> Reverse(1) < Reverse(2)
    False
    >>> Reverse(2) < Reverse(1)
    True
    >>> Reverse(1) <= Reverse(2)
    False
    >>> Reverse(2) <= Reverse(1)
    True
    >>> Reverse(2) <= Reverse(2)
    True
    >>> Reverse(1) == Reverse(1)
    True
    >>> Reverse(2) > Reverse(1)
    False
    >>> Reverse(1) > Reverse(2)
    True
    >>> Reverse(2) >= Reverse(1)
    False
    >>> Reverse(1) >= Reverse(2)
    True
    >>> Reverse(1)
    1
    '''
    def __init__(self, x: T) -> None:
        self.x = x
    def __lt__(self, other: Reverse) -> bool:
        return other.x.__lt__(self.x)
    def __le__(self, other: Reverse) -> bool:
        return other.x.__le__(self.x)
    def __eq__(self, other) -> bool:
        return self.x == other.x
    def __ne__(self, other: Reverse) -> bool:
        return other.x.__ne__(self.x)
    def __ge__(self, other: Reverse) -> bool:
        return other.x.__ge__(self.x)
    def __gt__(self, other: Reverse) -> bool:
        return other.x.__gt__(self.x)
    def __str__(self):
        return str(self.x)
    def __repr__(self):
        return str(self.x)


class MaxHeap(MinHeap):
    '''
    MaxHeap provides an implement of a maximum-heap, as heapq does not provide
    it. As it is a maximum heap, the first element of the heap is always the
    largest. It achieves this by wrapping around elements with Reverse,
    which reverses the comparison operations used by heapq.
    >>> h = MaxHeap([3, 1, 4, 2])
    >>> h[0]
    4
    >>> h.peek()
    4
    >>> h.push(5)  # N.B.: the array isn't always fully sorted.
    [5, 4, 3, 1, 2]
    >>> h.pop()
    5
    >>> h.pop()
    4
    >>> h.pop()
    3
    >>> h.pop()
    2
    >>> h.push(3).push(2).push(4)
    [4, 3, 2, 1]
    >>> h.replace(1)
    4
    >>> h
    [3, 1, 2, 1]
    '''
    def __init__(self, array: Optional[List[T]] = None):
        if array is not None:
            array = [Reverse(x) for x in array]  # Wrap with Reverse.
        super().__init__(array)
    def push(self, x: T) -> MaxHeap:
        super().push(Reverse(x))
        return self
    def peek(self) -> T:
        return super().peek().x
    def pop(self) -> T:
        return super().pop().x
    def replace(self, x: T) -> T:
        return super().replace(Reverse(x)).x


if __name__ == '__main__':
    import doctest
    doctest.testmod()

https://gist.github.com/marccarre/577a55850998da02af3d4b7b98152cf4


0

Python 中有一个内置的堆,但是这里是如何自己构建它。算法是可行的,但关于效率我不确定。

class Heap:

    def __init__(self):
        self.heap = []
        self.size = 0

    def add(self, heap):
        self.heap = heap
        self.size = len(self.heap)

    def heappush(self, value):
        self.heap.append(value)
        self.size += 1

    def heapify(self, heap, index=0):

        mid = int(self.size /2)
        """
            If you want to travel great value from the bottom to the top, you need to repeat swapping by the height of the tree.
            I don't know how I can get the height of the tree. That's why I use sezi/2.
            You can find the height by this formula:
            2^(x) = size+1  Why 2^x? Because the tree is growing exponentially
            xln(2) = ln(size+1)
            x = ln(size+1)/ln(2)
        """

        for i in range(mid):
            self.createTee(heap, index)

        return heap

    def createTee(self, heap, shiftindex):

        """
        """
        """

            This 'pos' variable refer to the index of the parent, only parent with children
                    (1)
                (2)      (3)           Here the size of the list is 7/2 = 3
            (4)   (5)  (6)  (7)        The number of parents is 3, but we use {2, 1, 0} in a 'while' loop.
                                       That is why a set 'pos' to -1.

        """
        pos = int(self.size /2) -1
        """
            This if you want to sort this heap list. We should swap the maximum value in the root of the tree with the last
            value in the list and if you want to repeat this until sort all list, you will need to prevent the function from
            change what we already sorted. I should decrease the size of the list. That will heapify on it.

        """

        newsize = self.size - shiftindex
        while pos >= 0:
            left_child = pos * 2 + 1
            right_child = pos * 2 + 2
            # This means that left child is exist
            if left_child < newsize:
                if right_child < newsize:

                    # If the right child exits, we want to check if the left
                    # child > rightchild.
                    #
                    # If the right child doesn't exist, we can check that
                    # we will get error out of range.
                    if heap[pos] < heap[left_child] and heap[left_child]  > heap[right_child]:
                        heap[left_child], heap[pos] = heap[pos], heap[left_child]
                # Here if the right child doesn't exist
                else:
                    if heap[pos] < heap[left_child]:
                        heap[left_child], heap[pos] = heap[pos], heap[left_child]
            # If the right child exists
            if right_child < newsize:
                if heap[pos] < heap[right_child]:
                    heap[right_child], heap[pos] = heap[pos], heap[right_child]
            pos -= 1

        return heap

    def sort(self):
        k = 1
        for i in range(self.size -1, 0, -1):
            """
            Because this is max-heap, we swap root with last element in the list

            """
            self.heap [0], self.heap[i] = self.heap[i], self.heap[0]
            self.heapify(self.heap, k)
            k += 1

        return self.heap


h = Heap()
h.add([5, 7, 0, 8, 9, 10, 20, 30, 50, -1])
h.heappush(-2)
print(" before heapify ")
print(h.heap)
print(" after heapify ")
print(h.heapify(h.heap, 0))
print(" after sort ")
print(h.sort())

输出

堆化之前

[5, 7, 0, 8, 9, 10, 20, 30, 50, -1, -2]

堆化之后

[50, 30, 20, 8, 9, 10, 0, 7, 5, -1, -2]

排序之后

[-2, -1, 0, 5, 7, 8, 9, 10, 20, 30, 50]


0
arr = [3, 4, 5, 1, 2, 3, 0, 7, 8, 90, 67, 31, 2, 5, 567]
# max-heap sort will lead the array to ascending order
def maxheap(arr, p):

    for i in range(len(arr)-p):
        if i > 0:
            child = i
            parent = (i + 1)//2 - 1

            while arr[child]> arr[parent] and child != 0:
                arr[child], arr[parent] = arr[parent], arr[child]
                child = parent
                parent = (parent + 1)//2 -1


def heapsort(arr):
    for i in range(len(arr)):
        maxheap(arr, i)
        arr[0], arr[len(arr)-i-1] = arr[len(arr)-i-1], arr[0]

    return arr


print(heapsort(arr))

试试这个。


谁说要排序了?这和什么相关?为什么是必要的?它是从哪里复制过来的? - Peter Mortensen

0
我创建了一个名为heap_class的包,它实现了最大堆,并将各种堆函数封装到与列表兼容的环境中。
>>> from heap_class import Heap
>>> h = Heap([3, 1, 9, 20], max=True)
>>> h.pop()
20

>>> h.peek()  # The same as h[0]
9

>>> h.push(17)  # Or h.append(17)
>>> h[0]  # The same as h.peek()
17

>>> h[1]  # Inefficient, but it works
9

从最大堆中获取一个最小堆。

>>> y = reversed(h)
>>> y.peek()
1

>>> y  # The representation is inefficient, but correct
Heap([1, 3, 9, 17], max=False)

>>> 9 in y
True

>>> y.raw()  # Underlying heap structure
[1, 3, 17, 9]

如其他人所提到的,在使用 heapq 时,在最大堆中处理字符串和复杂对象是相当困难的,因为存在不同形式的取反。但是使用 heap_class 实现则很容易:
>>> h = Heap(('aa', 4), ('aa', 5), ('zz', 2), ('zz', 1), max=True)
>>> h.pop()
('zz', 2)

自定义键被支持,并且可以与后续的推送/追加和弹出一起使用:

>>> vals = [('Adam', 'Smith'), ('Zeta', 'Jones')]
>>> h = Heap(vals, key=lambda name: name[1])
>>> h.peek()  # Jones comes before Smith
('Zeta', 'Jones')

>>> h.push(('Aaron', 'Allen'))
>>> h.peek()
('Aaron', 'Allen')

(该实现基于heapq函数构建,因此除了heappush和heapreplace在最大堆上使用Python之外,其余均为C或带有C包装器。)


0

如果您想使用最大堆获取最大的K个元素,可以使用以下技巧:

nums= [3,2,1,5,6,4]
k = 2  #k being the kth largest element you want to get
heapq.heapify(nums) 
temp = heapq.nlargest(k, nums)
return temp[-1]

2
不幸的是,这个算法的时间复杂度为O(MlogM),其中M = len(nums),这违背了使用heapq的初衷。 请参见此处nlargest的实现和注释 -> https://github.com/python/cpython/blob/7dc72b8d4f2c9d1eed20f314fd6425eab66cbc89/Lib/heapq.py#L521 - Arthur S
1
感谢您的有益评论,我会确保查看附加链接。 - RowanX

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接