合并K个已排序的数组/向量的复杂度

Question

合并K个已排序的数组/向量的复杂度

3

在研究合并k个已排序的连续数组/向量及其与合并k个已排序的链表实现的不同之处时，我发现了两种相对简单的合并k个连续数组的幼稚解决方案，以及一种基于成对合并的优化方法，模拟mergeSort()的工作方式。我实现的两个幼稚的解决方案似乎具有相同的复杂性，但在我运行的大规模随机测试中，其中一个比另一个效率低得多。

幼稚的合并

我的幼稚的合并方法如下所示。我们创建一个输出vector<int>，并将其设置为我们得到的k个向量中的第一个。然后我们合并第二个向量，然后是第三个，依此类推。由于一个典型的接受两个向量并返回一个向量的merge()方法在时间和空间上都是渐进线性的，与两个向量中的元素数量成正比，因此总复杂度将为O(n + 2n + 3n + ... + kn)，其中n是每个列表中平均元素的数量。由于我们要添加1n + 2n + 3n + ... + kn，我认为总复杂度为O(n*k^2)。请考虑以下代码：

vector<int> mergeInefficient(const vector<vector<int> >& multiList) {
  vector<int> finalList = multiList[0];
  for (int j = 1; j < multiList.size(); ++j) {
    finalList = mergeLists(multiList[j], finalList);
  }

  return finalList;
}

朴素选择

我的第二种朴素解决方案如下：

/**
 * The logic behind this algorithm is fairly simple and inefficient.
 * Basically we want to start with the first values of each of the k
 * vectors, pick the smallest value and push it to our finalList vector.
 * We then need to be looking at the next value of the vector we took the
 * value from so we don't keep taking the same value. A vector of vector
 * iterators is used to hold our position in each vector. While all iterators
 * are not at the .end() of their corresponding vector, we maintain a minValue
 * variable initialized to INT_MAX, and a minValueIndex variable and iterate over
 * each of the k vector iterators and if the current iterator is not an end position
 * we check to see if it is smaller than our minValue. If it is, we update our minValue
 * and set our minValue index (this is so we later know which iterator to increment after
 * we iterate through all of them). We do a check after our iteration to see if minValue
 * still equals INT_MAX. If it has, all iterators are at the .end() position, and we have
 * exhausted every vector and can stop iterative over all k of them. Regarding the complexity
 * of this method, we are iterating over `k` vectors so long as at least one value has not been
 * accounted for. Since there are `nk` values where `n` is the average number of elements in each
 * list, the time complexity = O(nk^2) like our other naive method.
 */
vector<int> mergeInefficientV2(const vector<vector<int> >& multiList) {
  vector<int> finalList;
  vector<vector<int>::const_iterator> iterators(multiList.size());

  // Set all iterators to the beginning of their corresponding vectors in multiList
  for (int i = 0; i < multiList.size(); ++i) iterators[i] = multiList[i].begin();

  int k = 0, minValue, minValueIndex;

  while (1) {
    minValue = INT_MAX;
    for (int i = 0; i < iterators.size(); ++i){
      if (iterators[i] == multiList[i].end()) continue;

      if (*iterators[i] < minValue) {
        minValue = *iterators[i];
        minValueIndex = i;
      }
    }

    iterators[minValueIndex]++;

    if (minValue == INT_MAX) break;
    finalList.push_back(minValue);
  }

  return finalList;
}

随机模拟

简而言之，我建立了一个简单的随机模拟，它构建了一个多维 vector<vector<int>>。该多维向量始于每个大小为2的2个向量，并以每个大小为600的600个向量结束。每个向量都是有序的，较大容器和每个子向量的大小在每次迭代中增加两个元素。我计算每个算法执行此操作所需的时间：

clock_t clock_a_start = clock();
finalList = mergeInefficient(multiList);
clock_t clock_a_stop = clock();

clock_t clock_b_start = clock();
finalList = mergeInefficientV2(multiList);
clock_t clock_b_stop = clock();

我接着绘制了以下图表：

plot

我的计算显示两个朴素解决方案（合并和选择）的时间复杂度相同，但上面的图表显示它们非常不同。起初我通过说一个比另一个有更多的开销来理性化这一点，但后来意识到开销应该是一个恒定因素，不会产生像下面这样的图表。这是什么原因呢？我认为我的复杂分析是错误的？

- Dominic Farolino

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Abhishek Bansal · Accepted Answer

即使两个算法的复杂度相同（在您的情况下是O(nk^2)），根据输入大小和涉及到的“常量”因素，它们的运行时间可能会有极大的差异。

例如，如果一个算法运行在n/1000 的时间内，而另一个算法以1000n 的时间运行，则它们都具有相同的渐近复杂度，但对于“合理”的选择n，它们将具有非常不同的运行时间。

此外，缓存、编译器优化等效果可能会显著改变运行时间。

对于您的情况，虽然您计算复杂度似乎是正确的，但在第一种情况下，实际运行时间应为(nk^2 + nk)/2，而在第二种情况下，运行时间应为nk^2。注意，除以2可能是重要的，因为随着k的增加，nk项将变得可以忽略。

对于第三个算法，您可以通过维护包含所有k向量的前k个元素的堆来修改Naive选择。然后，您的选择过程将花费O(logk)的时间，因此复杂度将降至O(nklogk)。