统计数组元素中绝对值不同的数量

Question

统计数组元素中绝对值不同的数量

9

我被要求回答一个面试问题，找出数组元素的绝对值中不同的数量。我想出了以下解决方案 (用C++写的)，但是面试官并不满意代码的运行效率。

请指出如何提高这段代码的运行效率？
另外，我该如何计算下面代码的效率？for循环执行了A.size()次。但是，我不确定STL std::find 的效率（在最坏的情况下，它可能是O(n)，这使得这个代码的复杂度为O(n²)）。

代码如下：

int countAbsoluteDistinct ( const std::vector<int> &A ) {
  using namespace std;
  list<int> x;

  vector<int>::const_iterator it;
  for(it = A.begin();it < A.end();it++)
    if(find(x.begin(),x.end(),abs(*it)) == x.end())
      x.push_back(abs(*it));
  return x.size();
}

- user7

2

每天只吃一条鱼真的不会对你有所帮助，特别是为了面试而言。你需要至少阅读几本关于数据结构和算法的书籍。我个人初学者最喜欢的是Sahni的《C++数据结构与算法》，然后再去阅读Langsam/Tennenbaum的《使用C和C++的数据结构》。至于问题，你应该知道面试官关心的是你是否能够推导出大O符号，而不是你是否知道。人们关心的是得到正确答案并获得点赞，他们会给你想要的东西，而不关心你需要的东西。 - Ajeet Ganga

1

同志，不要以为你必须使用STL算法只因为你需要在C++中编写代码。它们只是提供帮助的工具。如果需要，你应该能够自定义它们。而这个问题正是一个需要为了效率而扭曲标准算法的特殊情况。 - Ajeet Ganga

13个回答

3

std::find() 是线性时间复杂度（O(n)）。我建议使用排序的关联容器来处理这个问题，具体来说是 std::set。

#include <vector>
#include <set>
using namespace std;

int distict_abs(const vector<int>& v)
{
   std::set<int> distinct_container;

   for(auto curr_int = v.begin(), end = v.end(); // no need to call v.end() multiple times
       curr_int != end;
       ++curr_int)
   {
       // std::set only allows single entries
       // since that is what we want, we don't care that this fails 
       // if the second (or more) of the same value is attempted to 
       // be inserted.
       distinct_container.insert(abs(*curr_int));
   }

   return distinct_container.size();
}

这种方法仍然会有一些运行时惩罚。使用单独的容器会增加动态分配内存的成本，随着容器大小的增加。你可以就地执行此操作，以避免出现此惩罚，但在此级别的代码中，明确和清晰通常更好，并且让优化器（在编译器中）发挥作用。

- Chad

为什么不在这种情况下使用范围构造函数来构建distinct_container呢？std::set<int> distinct_container {v.begin(), v.end()}; 返回distinct_container.size()。你仍然需要考虑O(N log N)的运行时间。 - Flame

因为我们没有直接插入源中的值，所以在插入之前应用了一个转换（在这种情况下是 abs()）。如果不是这种情况，那么显然基于范围的构造函数是更好的选择。 - Chad

3

是的，这将是O(N²)--您最终会得到每个元素的线性搜索。

还有几个比较明显的替代方法，可以使用std::set或std::unordered_set。如果没有C++0x，可以用tr1::unordered_set或boost::unordered_set代替std::unordered_set。

在std::set中，每个插入操作的时间复杂度为O(log N)，因此总体复杂度为O(N log N)。

对于unordered_set，每次插入操作都具有常数（预期）复杂度，从而总体具有线性复杂度。

- Jerry Coffin

2

由于我对之前的回答不满意，现在是我的回答。您最初的问题没有提到向量有多大。假设您的std :: vector <>非常大，几乎没有重复项（为什么不呢？）。这意味着使用另一个容器（例如std :: set <>）基本上会使您的内存消耗翻倍。既然您的目标只是计算非重复项，为什么要这样做呢？

我喜欢@Flame的答案，但我对调用std::unique并不满意。您花了很多时间仔细排序您的向量，然后简单地丢弃了已排序的数组，而您之后可以重复使用它。

我在STD库中找不到任何真正优雅的东西，所以这是我的建议（混合使用std :: transform+ std :: abs + std :: sort ，但之后不触摸排序的数组）。

// count the number of distinct absolute values among the elements of the sorted container
template<class ForwardIt>
typename std::iterator_traits<ForwardIt>::difference_type 
count_unique(ForwardIt first, ForwardIt last)
{
  if (first == last)
    return 0;

  typename std::iterator_traits<ForwardIt>::difference_type 
    count = 1;
  ForwardIt previous = first;
  while (++first != last) {
    if (!(*previous == *first) ) ++count;
    ++previous;
  }
  return count;
}

额外加分点是它可以与前向迭代器一起使用：

#include <iostream>
#include <list>
int main()
{
  std::list<int> nums {1, 3, 3, 3, 5, 5, 7,8};
  std::cout << count_unique( std::begin(nums), std::end(nums) ) << std::endl;

  const int array[] = { 0,0,0,1,2,3,3,3,4,4,4,4};
  const int n = sizeof array / sizeof * array;
  std::cout << count_unique( array, array + n ) << std::endl;
  return 0;
}

- malat

2

基本上，将您的std :: list替换为std :: set。如果您正确地执行操作，则可以获得O（log（set.size（）））搜索+ O（1）插入。此外，为了效率，缓存abs（* it）的结果是有意义的，尽管这只会产生极小的影响。该方法的效率几乎是最好的，而不使用真正好的哈希（std :: set使用bin-trees）或有关向量中值的更多信息。

- Yourpalal

1

两点。

std::list 在搜索方面非常糟糕。每次搜索的时间复杂度为O(n)。
使用 std::set。插入是对数级别的，它会删除重复项并进行排序。将每个值插入后，时间复杂度为O(n log n)，然后使用 set::size 找到有多少个值。

编辑：

回答你问题的第二部分，C++ 标准规定了容器和算法操作的最坏情况。

Find：由于你正在使用接受迭代器的 free function 版本的 find，它不能假设传递的序列的任何信息，也不能假设范围已排序，因此它必须遍历每个项目直到找到匹配项，这是 O(n) 的。

另一方面，如果你使用 set::find，则该成员函数可以利用集合的结构，其性能要求为 O(log N)，其中 N 是集合的大小。

- Flame

0

我认为一个 std::map 也可能很有趣：

int absoluteDistinct(const vector<int> &A) 
{
    map<int, char> my_map;

    for (vector<int>::const_iterator it = A.begin(); it != A.end(); it++)
    {
        my_map[abs(*it)] = 0;
    }

    return my_map.size();
}

- karlphillip

0

你的代码中有嵌套循环。如果你要扫描整个数组的每个元素，它将给出O(n^2)的时间复杂度，在大多数情况下是不可接受的。这就是归并排序和快速排序算法出现的原因，以节省处理周期和机器工作量。我建议你浏览一下这些链接并重新设计你的程序。

- V Malhi

0

如@Jerry所说，在大多数其他答案的主题上做一些改进，而不是使用std :: map或std :: set，您可以使用std :: unordered_map或std :: unordered_set（或boost等效）。

这将将运行时间从O（n lg n）或O（n）降低。

另一个可能性是，根据给定数据的范围，您可以做一种基数排序的变体，尽管问题中没有立即暗示这一点。

- Chris Mennie

0

使用基数排序算法对列表进行排序，以实现O(n)级别的效率。比较相邻的值。

- Michael Dorgan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Flame · Accepted Answer

提供替代代码以替换现有的代码。

请注意，我们不希望更改调用者的向量（vector），因此我们按值获取。最好让编译器为我们复制，而不是自己手动复制。如果可以破坏它们的值，我们可以通过非const引用来获取。

#include <vector>
#include <algorithm>
#include <iterator>

#include <cstdlib>

using namespace std;

int count_distinct_abs(vector<int> v)
{
    transform(v.begin(), v.end(), v.begin(), abs); // O(n) where n = distance(v.end(), v.begin())
    sort(v.begin(), v.end()); // Average case O(n log n), worst case O(n^2) (usually implemented as quicksort.
    // To guarantee worst case O(n log n) replace with make_heap, then sort_heap.

    // Unique will take a sorted range, and move things around to get duplicated
    // items to the back and returns an iterator to the end of the unique section of the range
    auto unique_end = unique(v.begin(), v.end()); // Again n comparisons
    return distance(v.begin(), unique_end); // Constant time for random access iterators (like vector's)
}

这里的优点是，如果我们决定按值获取，那么我们只会分配/复制一次，而其余的都在原地完成，同时仍然为您提供 v 大小的平均复杂度为 O(n log n)。