C++排序和索引跟踪

Question

C++排序和索引跟踪

280

我想使用C++标准库对一系列样本进行升序排序，同时还要记住新样本的原始索引。

例如，我有一个由样本组成的集合、向量或矩阵A : [5, 2, 1, 4, 3]。我想将它们排序为B : [1, 2, 3, 4, 5]，但我还想记住这些值在原始集合'A'中的索引，以便获取另一个集合，即：C : [2, 1, 4, 3, 0 ]——它对应于每个元素在'B'中的索引，在原始集合'A'中的位置。

例如，在Matlab中，您可以执行以下操作：

 [a,b]=sort([5, 8, 7])
 a = 5 7 8
 b = 1 3 2

有人能想到一个好的方法吗？

- Mingus

16个回答

91

您可以对std::pair进行排序，而不仅仅是对整数进行排序 - 第一个整数是原始数据，第二个整数是原始索引。然后提供一个只根据第一个整数进行排序的比较器。例如：

Your problem instance: v = [5 7 8]
New problem instance: v_prime = [<5,0>, <8,1>, <7,2>]

使用比较器对新的问题实例进行排序，例如：

typedef std::pair<int,int> mypair;
bool comparator ( const mypair& l, const mypair& r)
   { return l.first < r.first; }
// forgetting the syntax here but intent is clear enough

使用该比较器对 v_prime 进行 std::sort 的结果应为：

v_prime = [<5,0>, <7,2>, <8,1>]

您可以通过遍历向量，获取每个 std::pair 中的 .second 来提取索引。

- RAC

1

这正是我所做的。基本排序函数不会跟踪旧位置和新位置，因为那会增加额外的不必要的开销。 - the_mandrill

14

该函数的缺点是需要重新分配所有值的内存空间。 - Yoav

1

这显然是可行的方法，但它有一个缺点，你必须将原始容器从“数字容器”更改为“对容器”。 - Ruslan

55

假设给定的向量为

A=[2,4,3]

创建一个新的向量

V=[0,1,2] // indicating positions

将V排序，但在排序时不要比较V的元素，而是比较对应的A的元素。

 //Assume A is a given vector with N elements
 vector<int> V(N);
 std::iota(V.begin(),V.end(),0); //Initializing
 sort( V.begin(),V.end(), [&](int i,int j){return A[i]<A[j];} );

- MysticForce

3

感谢您的评价。您甚至可以使用std::iota()函数更加优雅地初始化map。 - Nimrod Morag

可以的！感谢您的建议。 - MysticForce

5

std::iota(V.begin(), V.end(), x++) 可以改为 std::iota(V.begin(), V.end(), 0)，不需要创建或使用变量 x。 - Alex Xu

1

非常优雅的解决方案。 - Adrian

15

vector<pair<int,int> >a;

for (i = 0 ;i < n ; i++) {
    // filling the original array
    cin >> k;
    a.push_back (make_pair (k,i)); // k = value, i = original index
}

sort (a.begin(),a.end());

for (i = 0 ; i < n ; i++){
    cout << a[i].first << " " << a[i].second << "\n";
}

现在a包含排序后值的索引和它们各自的索引。

a[i].first = value在第i个位置。

a[i].second = idx在初始数组中的索引。

- nutNCracker

考虑添加代码描述，以便访问此帖子的用户了解其工作原理。 - BusyProgrammer

我实际上最喜欢这个解决方案 - 我的向量大约有4个大小，并且我被困在C++11之前，无法使用lambda。感谢Aditya Aswal。 - stephanmg

12

我写了一个通用版本的索引排序。

template <class RAIter, class Compare>
void argsort(RAIter iterBegin, RAIter iterEnd, Compare comp, 
    std::vector<size_t>& indexes) {

    std::vector< std::pair<size_t,RAIter> > pv ;
    pv.reserve(iterEnd - iterBegin) ;

    RAIter iter ;
    size_t k ;
    for (iter = iterBegin, k = 0 ; iter != iterEnd ; iter++, k++) {
        pv.push_back( std::pair<int,RAIter>(k,iter) ) ;
    }

    std::sort(pv.begin(), pv.end(), 
        [&comp](const std::pair<size_t,RAIter>& a, const std::pair<size_t,RAIter>& b) -> bool 
        { return comp(*a.second, *b.second) ; }) ;

    indexes.resize(pv.size()) ;
    std::transform(pv.begin(), pv.end(), indexes.begin(), 
        [](const std::pair<size_t,RAIter>& a) -> size_t { return a.first ; }) ;
}

使用方式与std::sort相同，除了还需要一个索引容器来接收排序后的索引。

注意：testing是原文中的示例，未作翻译处理。

int a[] = { 3, 1, 0, 4 } ;
std::vector<size_t> indexes ;
argsort(a, a + sizeof(a) / sizeof(a[0]), std::less<int>(), indexes) ;
for (size_t i : indexes) printf("%d\n", int(i)) ;

你应该得到 2 1 0 3。对于没有C++0x支持的编译器，请将lambda表达式替换为类模板：

template <class RAIter, class Compare> 
class PairComp {
public:
  Compare comp ;
  PairComp(Compare comp_) : comp(comp_) {}
  bool operator() (const std::pair<size_t,RAIter>& a, 
    const std::pair<size_t,RAIter>& b) const { return comp(*a.second, *b.second) ; }        
} ;

并将 std::sort 重写为

std::sort(pv.begin(), pv.end(), PairComp(comp)()) ;

- hkyi

你好hkyi！我们如何实例化这个模板函数？它有两个模板typename，其中一个是迭代器，使得这种情况非常罕见。你能帮忙吗？ - Scott Yang

6

我看到了这个问题，想到直接对迭代器进行排序是一种排序值并跟踪索引的方式；不需要定义一个额外的容器，其中包含（值，索引）的pair，当值是大对象时很有帮助；迭代器提供对值和索引的访问：

/*
 * a function object that allows to compare
 * the iterators by the value they point to
 */
template < class RAIter, class Compare >
class IterSortComp
{
    public:
        IterSortComp ( Compare comp ): m_comp ( comp ) { }
        inline bool operator( ) ( const RAIter & i, const RAIter & j ) const
        {
            return m_comp ( * i, * j );
        }
    private:
        const Compare m_comp;
};

template <class INIter, class RAIter, class Compare>
void itersort ( INIter first, INIter last, std::vector < RAIter > & idx, Compare comp )
{ 
    idx.resize ( std::distance ( first, last ) );
    for ( typename std::vector < RAIter >::iterator j = idx.begin( ); first != last; ++ j, ++ first )
        * j = first;

    std::sort ( idx.begin( ), idx.end( ), IterSortComp< RAIter, Compare > ( comp ) );
}

关于使用示例：

std::vector < int > A ( n );

// populate A with some random values
std::generate ( A.begin( ), A.end( ), rand );

std::vector < std::vector < int >::const_iterator > idx;
itersort ( A.begin( ), A.end( ), idx, std::less < int > ( ) );

现在，例如，在排序后的向量中第5个最小的元素将具有值**idx[ 5 ]，并且它在原始向量中的索引将是distance( A.begin( ), *idx[ 5 ] )或者简单地*idx[ 5 ] - A.begin( )。

- behzad.nouri

4

考虑使用@Ulrich Eckhardt建议的std :: multimap。只需将代码简化即可。

给定：

std::vector<int> a = {5, 2, 1, 4, 3};  // a: 5 2 1 4 3

插入排序（即在插入的同时进行排序）

std::multimap<int, std::size_t> mm;
for (std::size_t i = 0; i != a.size(); ++i)
    mm.insert({a[i], i});

检索值和原始索引

std::vector<int> b;
std::vector<std::size_t> c;
for (const auto & kv : mm) {
    b.push_back(kv.first);             // b: 1 2 3 4 5
    c.push_back(kv.second);            // c: 2 1 4 3 0
}

选择使用std::multimap而不是std::map的原因是要允许原始向量中出现相等的值。此外，请注意，与std::map不同，operator[]未定义用于std::multimap。

- aafulei

3

有另一种方式可以解决这个问题，使用地图：

vector<double> v = {...}; // input data
map<double, unsigned> m; // mapping from value to its index
for (auto it = v.begin(); it != v.end(); ++it)
    m[*it] = it - v.begin();

这将消除非唯一元素。如果这不可接受，请使用multimap：

vector<double> v = {...}; // input data
multimap<double, unsigned> m; // mapping from value to its index
for (auto it = v.begin(); it != v.end(); ++it)
    m.insert(make_pair(*it, it - v.begin()));

为了输出索引，可以遍历map或multimap：

for (auto it = m.begin(); it != m.end(); ++it)
    cout << it->second << endl;

- Ulrich Eckhardt

3

@Lukasz Wiklendt提供了优美的解决方案！虽然在我的情况下需要更通用的解决方案，所以我做了一些修改：

template <class RAIter, class Compare>
vector<size_t> argSort(RAIter first, RAIter last, Compare comp) {

  vector<size_t> idx(last-first);
  iota(idx.begin(), idx.end(), 0);

  auto idxComp = [&first,comp](size_t i1, size_t i2) {
      return comp(first[i1], first[i2]);
  };

  sort(idx.begin(), idx.end(), idxComp);

  return idx;
}

例子：通过长度对字符串向量进行排序以查找索引，但第一个元素是虚拟的。

vector<string> test = {"dummy", "a", "abc", "ab"};

auto comp = [](const string &a, const string& b) {
    return a.length() > b.length();
};

const auto& beginIt = test.begin() + 1;
vector<size_t> ind = argSort(beginIt, test.end(), comp);

for(auto i : ind)
    cout << beginIt[i] << endl;

打印：

abc
ab
a

- sigvaldm

2

我最近接触到了C++20 <ranges>中优雅的投影特性，它可以让代码变得更短、更清晰：

std::vector<std::size_t> B(std::size(A));
std::iota(begin(B), end(B), 0);
std::ranges::sort(B, {}, [&](std::size_t i){ return A[i]; });

{} 指的是通常的 std::less<std::size_t>。因此，正如您所看到的，我们定义了一个在任何比较之前调用每个元素的函数。这个投影特性实际上相当强大，因为这个函数可以是一个 lambda，也可以是一个方法或成员值。例如：

struct Item {
    float price;
    float weight;
    float efficiency() const { return price / weight; }
};

int main() {
    std::vector<Item> items{{7, 9}, {3, 4}, {5, 3}, {9, 7}};
    std::ranges::sort(items, std::greater<>(), &Item::efficiency);
    // now items are sorted by their efficiency in decreasing order:
    // items = {{5, 3}, {9, 7}, {7, 9}, {3, 4}}
}

如果我们想按价格递增排序：

std::ranges::sort(items, {}, &Item::price);

不要定义operator<或使用lambda，使用投影！

- Arthur R.

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lukasz Wiklendt · Accepted Answer

使用 C++ 11 的lambda表达式：

#include <iostream>
#include <vector>
#include <numeric>      // std::iota
#include <algorithm>    // std::sort, std::stable_sort

using namespace std;

template <typename T>
vector<size_t> sort_indexes(const vector<T> &v) {

  // initialize original index locations
  vector<size_t> idx(v.size());
  iota(idx.begin(), idx.end(), 0);

  // sort indexes based on comparing values in v
  // using std::stable_sort instead of std::sort
  // to avoid unnecessary index re-orderings
  // when v contains elements of equal values 
  stable_sort(idx.begin(), idx.end(),
       [&v](size_t i1, size_t i2) {return v[i1] < v[i2];});

  return idx;
}

现在您可以使用返回的索引向量进行迭代，例如：

for (auto i: sort_indexes(v)) {
  cout << v[i] << endl;
}

你还可以选择提供原始索引向量、排序函数、比较器或使用额外的向量自动重新排序v在sort_indexes函数中。