如何在C ++中找到重复元素的索引？

Question

如何在C ++中找到重复元素的索引？

9

在C++中，有没有STL函数可以找到数组中所有重复元素的索引？

例如：

int array[] = {1,1,2,3,4};

应该返回 0,1。

- Raghav

3

做了一个有点调皮的修改；大概你想要一个整数数组，而不是一个指向整数的指针数组和一堆未定义行为？ - Bathsheba

5

我认为你可以用std::sort、std::adjacent_find和循环来完成这个任务。 - NathanOliver

2

更改数组是否可以？暂时更改数组是否可以？创建临时数组是否可以？如果更改是可以的，请参考@NathanOlivier的评论。 - pts

3

但是任何带有排序的解决方案都会失去原始索引。 - juanchopanza

1

数组中的类型始终为整数（或内置类型）吗？这可能会影响算法的设计选择。 - BiagioF

显示剩余10条评论

3个回答

0

我认为 STL 没有现成的方法可以做到这一点。以下是一个 O(N*N) 的解决方案：

    int array[] = {1, 2, 3, 1, 4};
    constexpr int size = 5; // ToDo - don't hardcode this.
    bool duplicates[size] = {};

    for (std::size_t i = 0; i < size; ++i){
        if (!duplicates[i]){ /*No point in re-testing*/
            for (std::size_t j = i + 1; j < size; ++j){
                if (array[i] == array[j]){
                    duplicates[i] = duplicates[j] = true;
                }
            }
        }
    }

基于排序的方法可能对较长的数组更有效：但是您需要构建一个新位置 -> 旧位置的表格，以获取重复元素的索引。

- Bathsheba

那样会将复杂度提升至O(N^2)。无法做到比这更少吗？ - Raghav

1

嗯，这是O(n**2)，而排序可能更接近于O(nlogn)。 - Ped7g

@Raghav：我想不出一种“中心枢轴”式的方法，可以让你达到O（N log N）的时间复杂度。 - Bathsheba

嗯，快速排序通常被认为是nlogn的... :) 但这只是O()符号，如果OP对真实世界的性能感兴趣，他应该发布数据和结构的真实限制，因为在有利的n情况下，简洁的n ** 2仍然可能比肮脏的nlogn更快。 - Ped7g

0

这是我的一些想法。不太确定这个的大O时间复杂度，但是看起来像是O(N)：

std::vector<std::size_t> findDuplicateIndices(std::vector<int> const & v)
{
    std::vector<std::size_t> indices;
    std::map<int, std::pair<int, std::size_t>> counts; // pair<amount, firstSeenPos>

    for (std::size_t i = 0 ; i < v.size() ; ++i)
    {
        std::size_t const amount = ++counts[v[i]].first;
        /**/ if (amount == 1) // First encounter, record the position
        {
            counts[v[i]].second = i;
            continue;
        }
        else if (amount == 2) // Second encounter, add the first encountered position
            indices.push_back(counts[v[i]].second);

        indices.push_back(i);
    }
    return indices;
}

在线尝试!

- Chnossos

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- WhiZTiM · Accepted Answer

为了高效地跟踪重复的索引，您可以使用 std::unordered_set（用于唯一地跟踪重复的索引），并使用 std::unordered_map（用于跟踪唯一数字及其索引）。

这样做的时间复杂度为 O(N * [O(1) + ... + O(1)]) ...大约等于 O(N)：

template<typename ForwardIterator>
std::vector<int> get_duplicate_indices(ForwardIterator first, ForwardIterator last){
    std::unordered_set<int> rtn;
    std::unordered_map<int, int> dup;
    for(std::size_t i = 0; first != last; ++i, ++first){
        auto iter_pair = dup.insert(std::make_pair(*first, i));
        if(!iter_pair.second){
            rtn.insert(iter_pair.first->second);
            rtn.insert(i);
        }
    }
    return {rtn.begin(), rtn.end()};
}

解释：

给定一个数组 A

使用一组唯一索引，rtn。
使用一个 KV（键值）映射，dup；其中 k 是数组 A 中的一个元素，v 是该元素在数组中的索引。
对于数组中的每个具有索引 i 的项 a：
- 如果存在一个k等于a并且属于dup，则找到kv
- 如果它存在，
  - 将 i 插入到 rtn 中
  - 将 v 插入到 rtn 中
- 否则，将 a 和 i 作为kv添加到 dup 中
返回 rtn

查看完整示例：在 Coliru 上实时演示。

输入为：

int array[] = {1,1,2,3,4};

我们的输出结果是：

1 0

再次提醒，

对于以下输入：

int array[] = {1, 1, 2, 3, 4, 1, 0, 0, 9};

我们的输出为：

7 0 5 1 6

如果你需要有序的索引，你可以简单地对结果数组进行排序。