高效队列在Haskell中的实现

Question

高效队列在Haskell中的实现

algorithmdata-structureshaskelllistqueue

31

如何高效地实现一个列表数据结构，其中我可以有两个视图，分别指向列表的头部和尾部，而不需要昂贵的反转操作。

start x = []
end x = reverse start -- []
start1 = [1,2,3] ++ start
end start1 -- [3,2,1]

end应该能够在不调用'reverse'的情况下完成此操作，只需从列表存在反向自动化的角度观察给定列表即可。如果我通过连接创建新列表来开始，同样应该适用。

- TheOne

1

在Haskell中，您无法更改值。 start始终是空列表，而end始终是该列表的reverse（即空列表）。如果您想保留状态，则应查看State monad。 - R. Martinho Fernandes

1

更正：我指的是重新绑定。 - TheOne

1

@Absolute：无论你如何称呼它，都不能改变Haskell中无法更改事物的最终真相（IO单子除外）。你无法重新绑定事物。 - R. Martinho Fernandes

4

@Martinho：我所说的“更改”是指基于旧文件创建一个新文件并分配一个新名称，不确定为什么这不清楚。 - TheOne

1

因为你没有表述清楚。更新和重新绑定可能意味着不同的事情。 - Rayne

2

我强烈推荐Apocalisp提到的纯函数数据结构。这本书中有很多“啊哈”时刻。 - Godeke

4个回答

7

当我在谷歌搜索“Haskell队列”时，这个问题出现在第一页的第三个结果，但之前给出的信息是误导性的。因此，我觉得有必要澄清一些事情。（而第一个搜索结果是一个包含粗心实现的博客文章...）

以下所有内容基本上都来自于Okasaki在1995年发表的论文Simple and efficient purely functional queues and deques或他的book。

好的，让我们开始吧。

A persistent queue implementation with amortised O(1) time complexity is possible. The trick is to reverse the list representing the rear part of a queue as long as the front part is long enough to amortise the cost of reverse operation. So, instead of reversing the rear part when the front part is empty, we reverse it when the front part is shorter than the rear part. The following code is from the appendix of Okasaki's book

data BQueue a = BQ !Int [a] !Int [a]

check :: Int -> [a] -> Int -> [a] -> BQueue a
check lenf fs lenr rs =
  if lenr <= lenf 
  then BQ lenf fs lenr rs 
  else BQ (lenr+lenf) (fs ++ reverse rs) 0 [] 

head :: BQueue a -> a
head (BQ _ []    _ _) = error "empty queue"
head (BQ _ (x:_) _ _) = x

(|>) :: BQueue a -> a -> BQueue a 
(BQ lenf fs lenr rs) |> x = check lenf fs (lenr + 1) (x:rs)

tail :: BQueue a -> BQueue a
tail (BQ lenf (x:fs) lenr rs) = check (lenf-1) fs lenr rs

And why is this amortised O(1) even used persistently? Haskell is lazy, so reverse rs does not actually happen until it is needed. To force reverse rs, it has to take |fs| steps before reaching the reverse rs. If we repeat tail before reaching the suspension reverse rs, then the result will be memorised so at the second time it takes only O(1). On the other hand, if we use the version before placing the suspension fs ++ reverse rs, then again it has to go through fs steps before reaching reverse rs. A formal proof using (modified) Banker's method is in Okasaki's book.
The answer by @Apocalisp

When the dequeue list is empty, refill it by reversing the enqueue list

is the implementation in Ch 5 of his book with a warning in the very beginning

Unfortunately, the simple view of amortization presented in this chapter breaks in the presence of persistence

Okasaki described his amortised O(1) persistent queue in Ch 6.
So far, we have been talking about amortised time complexity only. It is actually possible to eliminate amortisation completely to achieve the worst-case O(1) time complexity for persistent queue. The trick is that reverse has to be forced incrementally every time a de/enqueue is called. The actual implementation is a bit hard to explain here, though.

再次强调，所有内容都已经在他的论文中了。

- L.-T. Chen

5

你是否正在寻找Data.Dequeue？

虽然它没有reverse，但你可以很容易地添加它并将补丁发送给作者。

- ephemient

6

我知道这个库，但我想自己实现它，作为一种有趣的思维实验。 - TheOne

或者您可以使用已经在标准库中的 Data.Sequence 模块。 - newacct

或者...他们可以将其作为有趣的思维实验自己实现，就像他们已经确定要做的那样。这样做没有任何问题。 - semicolon

1

我并不是一个 Haskell 用户，但我找到了一篇博客文章，声称描述了一个可以在摊销常数时间内操作的 Haskell 队列。它基于 Chris Okasaki 的优秀著作《纯函数数据结构》中的设计。

- bcat

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Apocalisp · Accepted Answer

你可以始终使用 Data.Sequence。

或者，一个众所周知的纯函数队列实现是使用两个列表。一个用于入队（enqueue），另一个用于出队（dequeue）。入队简单地在入队列表上执行 cons 操作。出队则取出出队列表的头部。当出队列表短于入队列表时，通过反转入队列表来重新填充出队列表。详见 Chris Okasaki 的《纯函数数据结构》Purely Functional Datastructures.。

虽然这个实现中使用了reverse，但其均摊时间复杂度是可忽略的。每次入队都会导致出队列表需要 Θ(1) 的时间进行重新填充。因此，出队的期望时间最多是入队时间的两倍。这是一个常数因子，因此这两个操作的最坏情况时间复杂度都是 O(1)。