测试加权快速合并算法?

4
我正在通过Coursera学习Algorithms, Part I,并希望测试Quick Find、Quick Union和Weighted Quick Union算法的运行时间。该课程使用Java语言,而我不熟悉Java,因此我已经尝试用我更熟悉的Python重新编写了这些算法。

现在,我计划测试每个函数以验证其运行时间/复杂度。我一直在考虑使用timeit库,但似乎会产生错误的结果,例如,Weighted Quick Union所需的时间比QuickUnion还长。

我该如何验证Weighted Quick Union确实是O(log n),并且比Quick Union更快呢?以下是我迄今为止创建和尝试过的内容:

O(N**2) - 慢

class QuickFind_Eager:
    def __init__(self, nodes):
        self.array = [num for num in range(nodes)]

    # Joins two nodes into a component
    def union(self, first_node, second_node):
        for pos, val in enumerate(self.array):
            if self.array[pos] == self.array[first_node]:
                self.array[pos] = self.array[second_node]

    # Checks if two nodes are in the same component
    def connected(self, first_node, second_node):
        return self.array[first_node] == self.array[second_node]

O(N) - 仍然太慢 - 避免使用

class QuickUnion_Lazy:
    def __init__(self, nodes):
        self.array = [num for num in range(nodes)]

    # Follows parent pointers to actual root
    def root(self, parent):
        while parent != self.array[parent]:
            parent = self.array[parent]
        return parent

    # Joins two nodes into a component
    def union(self, first_node, second_node):
        self.array[first_node] = self.array[second_node]

    # Checks if two nodes are in the same component
    def connected(self, first_node, second_node):
        return self.root(first_node) == self.root(second_node)

O(log N) - 相当快

class WeightedQuickUnion:
    def __init__(self, nodes):
        self.array = [num for num in range(nodes)]
        self.weight = [num for num in range(nodes)]

    # Follows parent pointers to actual root
    def root(self, parent):
        while parent != self.array[parent]:
            parent = self.array[parent]
        return parent

    # Joins two nodes into a component
    def union(self, first_node, second_node):
        if self.root(first_node) == self.root(second_node):
            return

        if (self.weight[first_node] < self.weight[second_node]):
            self.array[first_node] = self.root(second_node)
            self.weight[second_node] += self.weight[first_node]
        else:
            self.array[second_node] = self.root(first_node)
            self.weight[first_node] += self.weight[second_node]

    # Checks if two nodes are in the same component
    def connected(self, first_node, second_node):
        return self.root(first_node) == self.root(second_node)

O(N + M lg* N) - 非常快

class WeightedQuickUnion_PathCompression:
    def __init__(self, nodes):
        self.array = [num for num in range(nodes)]
        self.weight = [num for num in range(nodes)]

    # Follows parent pointers to actual root
    def root(self, parent):
        while parent != self.array[parent]:
            self.array[parent] = self.array[self.array[parent]]
            parent = self.array[parent]
        return parent

    # Joins two nodes into a component
    def union(self, first_node, second_node):
        if self.root(first_node) == self.root(second_node):
            return

        if self.weight[first_node] < self.weight[second_node]:
            self.array[first_node] = self.root(second_node)
            self.weight[second_node] += self.weight[first_node]
        else:
            self.array[second_node] = self.root(first_node)
            self.weight[first_node] += self.weight[second_node]

    # Checks if two nodes are in the same component
    def connected(self, first_node, second_node):
        return self.root(first_node) == self.root(second_node)

测试运行时间

def test_quickfind(quickfind):
    t = quickfind(100)
    t.union(1,2)
    t.connected(1,2)
    t.union(4,2)
    t.union(3,4)
    t.connected(0,2)
    t.connected(1,4)
    t.union(0,3)
    t.connected(0,4)

import timeit

t = timeit.timeit(stmt="test_quickfind(QuickFind_Eager)", setup="from __main__ import QuickFind_Eager; from __main__ import test_quickfind", number=100000)
print(t)
# 11.4380569069981
t = timeit.timeit(stmt="test_quickfind(QuickUnion_Lazy)", setup="from __main__ import QuickUnion_Lazy; from __main__ import test_quickfind", number=100000)
print(t)
# 1.4744456350017572
t = timeit.timeit(stmt="test_quickfind(WeightedQuickUnion)", setup="from __main__ import WeightedQuickUnion; from __main__ import test_quickfind", number=100000)
print(t)
# 2.738758583996969
t = timeit.timeit(stmt="test_quickfind(WeightedQuickUnion_PathCompression)", setup="from __main__ import WeightedQuickUnion_PathCompression; from __main__ import test_quickfind", number=100000)
print(t)
# 3.0113827050008695

更新 添加了来自timeit的结果。


你的 timeit 性能分析方法有什么问题? - Cory Kramer
抱歉,我在原文中没有包含这个信息。我已经更新了内容以显示运行时间。 Quick Union 和 Weighted Union 的运行时间逐渐增加。 - HammerMeetNail
2个回答

2
你需要将算法的运行时间制表,作为问题规模的函数,例如针对不同的问题规模调用quickfind(比如100、200、300、400、500;请注意,对于简单的O(n^2)算法,后者至少需要运行3分钟)。
你仍然无法保证观察到渐进运行时间函数(这就是O符号的含义:实际上,O(f)描述了一组函数g_ig_i = a_i * f(n) + b_i; a_i, b_i: const [有点滥用符号]),因为你的某些实现可能会遇到资源耗尽(即没有更多RAM可用),导致性能显著下降超出你的实现范围。

好的,只是为了确认我的理解,当你说不同的问题大小时,你指的是test_quickfind函数中联合和连接操作的总数吗? - HammerMeetNail
不,我的意思是您数据结构中的元素数量,即您“quickfind”例程的参数。 - collapsar

0
QuickFindEager类中union函数的实现不正确。 在循环之前,应将self.array[first_node]self.array[second_node]添加到变量中,然后在循环中从变量中更改。
def union(self, first_node, second_node):
    pid = self.array[first_node]
    qid = self.array[second_node]
    for pos, val in enumerate(self.array):
        if self.array[pos] == pid:
            self.array[pos] = qid

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接