使用heapq进行降序排序

12

我正在使用Python的heapq模块来按升序和降序获取数据。

对于升序,我正在使用最小堆,并且以下方法可以正常工作:

>>> from heapq import heapify, heappop
>>> heap = [9, 3, 1, 5, 6, 2, 7]
>>> heapify(heap)
>>> heappop(heap)
1
>>> heappop(heap)
2
>>> heappop(heap)
3

对于降序,我尝试了不同的方法,但它们都有一些缺点:

  1. Using negative value as the priorirty to get reverse sort. I have to use separate list to make data reusable. If the original list is big, having copy of list is costly.

    >>> from heapq import heapify, heappop
    >>> heap = [9, 3, 1, 5, 6, 2, 7]
    >>> heap_neg = [-x for x in heap]
    >>> heapify(heap_neg)
    >>> -heappop(heap_neg)
    9
    >>> -heappop(heap_neg)
    7
    >>> -heappop(heap_neg)
    6
    
  2. Using tuple with negative value as priority, this is also waste of space. I would not like to store list of ints as list of tuples.

    >>> from heapq import heapify, heappop
    >>> heap = [(-9, 9), (-3, 3), (-1, 1), (-5, 5), (-6, 6), (-2,2), (-7,7)]
    >>> heapify(heap)
    >>> heappop(heap)[1]
    9
    >>> heappop(heap)[1]
    7
    >>> heappop(heap)[1]
    6
    
  3. Using key to sort in heapify is missing. Something like:

    >>> from heapq import heapify, heappop
    >>> heap = [9, 3, 1, 5, 6, 2, 7]
    >>> heapify(heap, key=lambda x:-x) # This doesn't work as heapify don't have key parameter
    
  4. If I use, heapq._heapify_max(heap), I will have to _heapify_max after each element pop. Like:

    >>> from heapq import _heapify_max, heappop
    >>> heap = [9, 3, 1, 5, 6, 2, 7]
    >>> _heapify_max(heap)
    >>> heappop(heap)
    9
    >>> heappop(heap)  # popping without _heapify_max gives wrong result
    1
    >>> _heapify_max(heap)
    >>> heappop(heap) # popping after _heapify_max gives correct result
    7
    
有没有办法让我获得与升序相似的降序呢? :)

1
如果你只需要最大的特定数量元素,可以尝试使用 heapq.nlargest(还有 heapq.nsmallest 可以执行你第一组代码所做的操作)。当你关心的项目数量远小于原始列表中的项目数量时,这非常高效。 - Blckknght
2
如果您愿意使用未记录的 heapqmax 特性,那么在 _heapify_max(heap) 之后会有一个 _heappop_max(heap) - AChampion
1
你能展示一下你的中位数运算代码吗?我不确定当你在处理数据流时,如何考虑到负值情况下的数据复制问题。毕竟,在这种情况下,你是不是从两个空堆开始计算结果的呢? - Blckknght
1
@AChampion:唉,没有_heappush_max,所以使用未记录的函数也是不完美的。 - Blckknght
1
那段代码不起作用,因为你从来没有在它上面调用 _heapify_max。你需要在开始时进行一次堆化,否则它就没有堆属性,其他操作也无法按照你的意愿执行。只需进行一次调用(在第一次弹出之前),你就可以得到期望的结果。 - Blckknght
显示剩余5条评论
3个回答

3

如我们在评论中讨论的那样,当您从空堆开始并随着添加值而增加时,使用否定值来将最小堆翻转为最大堆时复制数据的担忧就不再重要了。由于这是在流值中查找运行中位数的用例,因此添加值时取反值应该可以正常工作。

这是我编写的运行中位数生成器,只是为了双重检查它是否按我预期的方式工作:

def running_median(iterable):
    left_q = [] # heap of smaller-than-median elements, stored negated
    right_q = [] # heap of larger-than-median elements

    for value in iterable:
        if len(left_q) == len(right_q): # push to left_q when they're equal size
            if len(right_q) > 0 and value > right_q[0]:
                value = heapq.heapreplace(right_q, value)
            heapq.heappush(left_q, -value)
        else: # push to right_q only when it's (strictly) smaller
            if value < -left_q[0]:
                value = -heapq.heapreplace(left_q, -value)
            heapq.heappush(right_q, value)

        # len(left_q) is always >= len(right_q) so we never yield right_q[0]
        if len(left_q) > len(right_q):
            yield -left_q[0]
        else:
            yield (-left_q[0] + right_q[0]) / 2

left_q 堆存储小于等于中位数的值。每个值在推入堆时都会被取反,因此在对其使用普通的最小堆操作时,它的效果就像最大堆。我们只需要记住重新取反任何取出的值,以恢复原始符号。


2

这个可以使用私有方法实现(在 Python 3.8 上测试过)

import heapq


if __name__ == '__main__':
    a = [1, 3, 2, 5]

    heapq._heapify_max(a)

    for item in range(0, len(a)):
        print(heapq._heappop_max(a)

结果是

sorted heap  5
sorted heap  3
sorted heap  2
sorted heap  1

但是对于某些人来说,使用私有方法可能看起来不够正确。因此,我们可以通过将对象放置在修改后的包装器中来改变排序。

class DescOrder:
    def __init__(self, entity):
        self.entity = entity

    def __lt__(self, o):
        return self.entity.__gt__(o.entity)

    def __repr__(self):
        return str(self.entity)

def check_sorting(a, b):
    new_heap = []

    for element in a:
        heapq.heappush(new_heap, DescOrder(element))

    for index in range(0, len(b)):
        assert heapq.heappop(new_heap).entity == b[index]


if __name__ == '__main__':
    check_sorting([5, 1, -1, 3, 2], [5, 3, 2, 1, -1])
    check_sorting([5, 2, -1, 3, 1], [5, 3, 2, 1, -1])
    check_sorting([-1, 2, 5, 3, 1], [5, 3, 2, 1, -1])
 

2

我认为在这种情况下,你正在寻找一个排过序的链表,我修改了我发现的这里的某些内容,使其按照升序插入(我添加了pop函数,由于某种原因代码中没有它,但我认为你可能需要它):

# Python program to insert in sorted list

# Node class 
class Node:

    # Constructor to initialize the node object
    def __init__(self, data):
        self.data = data
        self.next = None

class LinkedList:

    # Function to initialize head
    def __init__(self):
        self.head = None

    def sortedInsert(self, new_node):

        # Special case for the empty linked list 
        if self.head is None:
            new_node.next = self.head
            self.head = new_node

        # Special case for head at end
        elif self.head.data <= new_node.data:
            new_node.next = self.head
            self.head = new_node

        else :

            # Locate the node before the point of insertion
            current = self.head
            while(current.next is not None and
                 current.next.data > new_node.data):
                current = current.next

            new_node.next = current.next
            current.next = new_node

    # Function to insert a new node at the beginning
    def push(self, new_data):
        new_node = Node(new_data)
        new_node.next = self.head
        self.head = new_node

    # Utility function to prit the linked LinkedList
    def printList(self):
        temp = self.head
        while(temp):
            print(temp.data),
            temp = temp.next

    def pop(self):
        val = self.head.data
        self.head = self.head.next
        return val


# Driver program
llist = LinkedList()
new_node = Node(5)
llist.sortedInsert(new_node)
new_node = Node(10)
llist.sortedInsert(new_node)
new_node = Node(7)
llist.sortedInsert(new_node)
new_node = Node(3)
llist.sortedInsert(new_node)
new_node = Node(1)
llist.sortedInsert(new_node)
new_node = Node(9)
llist.sortedInsert(new_node)
print("Create Linked List")
llist.printList()

正如您所看到的,只需将 >= 更改为 <=,它就可以完美地完成工作。


非常感谢您的回答,但我想使用一些内置的功能。此外,我想使用堆,因为它们具有O(logn)的插入复杂度,而使用链表则需要O(n)的复杂度。 - iamrishap
1
@rishap,你有没有考虑过树?准确来说是OrderedTreeSet?这里有一些文档http://knuth.luther.edu/~leekent/CS2Plus/chap6/chap6.html。 - developer_hatch

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接