在研究合并k个已排序的连续数组/向量及其与合并k个已排序的链表实现的不同之处时,我发现了两种相对简单的合并k个连续数组的幼稚解决方案,以及一种基于成对合并的优化方法,模拟mergeSort()的工作方式。我实现的两个幼稚的解决方案似乎具有相同的复杂性,但在我运行的大规模随机测试中,其中一个比另一个效率低得多。
幼稚的合并
我的幼稚的合并方法如下所示。我们创建一个输出vector<int>
,并将其设置为我们得到的k
个向量中的第一个。然后我们合并第二个向量,然后是第三个,依此类推。由于一个典型的接受两个向量并返回一个向量的merge()
方法在时间和空间上都是渐进线性的,与两个向量中的元素数量成正比,因此总复杂度将为O(n + 2n + 3n + ... + kn)
,其中n
是每个列表中平均元素的数量。由于我们要添加1n + 2n + 3n + ... + kn
,我认为总复杂度为O(n*k^2)
。请考虑以下代码:
vector<int> mergeInefficient(const vector<vector<int> >& multiList) {
vector<int> finalList = multiList[0];
for (int j = 1; j < multiList.size(); ++j) {
finalList = mergeLists(multiList[j], finalList);
}
return finalList;
}
朴素选择
我的第二种朴素解决方案如下:
/**
* The logic behind this algorithm is fairly simple and inefficient.
* Basically we want to start with the first values of each of the k
* vectors, pick the smallest value and push it to our finalList vector.
* We then need to be looking at the next value of the vector we took the
* value from so we don't keep taking the same value. A vector of vector
* iterators is used to hold our position in each vector. While all iterators
* are not at the .end() of their corresponding vector, we maintain a minValue
* variable initialized to INT_MAX, and a minValueIndex variable and iterate over
* each of the k vector iterators and if the current iterator is not an end position
* we check to see if it is smaller than our minValue. If it is, we update our minValue
* and set our minValue index (this is so we later know which iterator to increment after
* we iterate through all of them). We do a check after our iteration to see if minValue
* still equals INT_MAX. If it has, all iterators are at the .end() position, and we have
* exhausted every vector and can stop iterative over all k of them. Regarding the complexity
* of this method, we are iterating over `k` vectors so long as at least one value has not been
* accounted for. Since there are `nk` values where `n` is the average number of elements in each
* list, the time complexity = O(nk^2) like our other naive method.
*/
vector<int> mergeInefficientV2(const vector<vector<int> >& multiList) {
vector<int> finalList;
vector<vector<int>::const_iterator> iterators(multiList.size());
// Set all iterators to the beginning of their corresponding vectors in multiList
for (int i = 0; i < multiList.size(); ++i) iterators[i] = multiList[i].begin();
int k = 0, minValue, minValueIndex;
while (1) {
minValue = INT_MAX;
for (int i = 0; i < iterators.size(); ++i){
if (iterators[i] == multiList[i].end()) continue;
if (*iterators[i] < minValue) {
minValue = *iterators[i];
minValueIndex = i;
}
}
iterators[minValueIndex]++;
if (minValue == INT_MAX) break;
finalList.push_back(minValue);
}
return finalList;
}
随机模拟
简而言之,我建立了一个简单的随机模拟,它构建了一个多维 vector<vector<int>>
。该多维向量始于每个大小为2
的2
个向量,并以每个大小为600
的600
个向量结束。每个向量都是有序的,较大容器和每个子向量的大小在每次迭代中增加两个元素。我计算每个算法执行此操作所需的时间:
clock_t clock_a_start = clock();
finalList = mergeInefficient(multiList);
clock_t clock_a_stop = clock();
clock_t clock_b_start = clock();
finalList = mergeInefficientV2(multiList);
clock_t clock_b_stop = clock();
我接着绘制了以下图表:
O(nklog(k))
,我发现有三种方法:1)对所有nk个元素进行排序,2)成对合并,3)像你所说的使用堆。 - Dominic Farolino