为什么使用std::multiset作为优先队列比使用std::priority_queue更快？

Question

为什么使用std::multiset作为优先队列比使用std::priority_queue更快？

c++performancepriority-queuemultimapmultiset

21

我尝试用std::priority_queue替换std::multiset。但是我对速度结果感到失望。算法的运行时间增加了50%... 这里是相应的命令：

top() = begin();
pop() = erase(knn.begin());
push() = insert();

我对优先队列的实现速度感到惊讶，我预期结果会有所不同（对于优先队列来说更好）... 从概念上讲，multiset被用作优先队列。为什么即使使用了-O2，优先队列和multiset的性能差异如此之大？

在MSVS 2010、Win XP、32位系统中，方法findAllKNN2()的十次平均结果（请参见下文）。

MS
N           time [s]
100 000     0.5
1 000 000   8

PQ
N           time [s]
100 000     0.8
1 000 000   12

这些结果的原因是什么？源代码没有做出其他更改... 谢谢您的帮助...

微软实现：

template <typename Point>
struct TKDNodePriority
{
    KDNode <Point> *node;
    typename Point::Type priority;

    TKDNodePriority() : node ( NULL ), priority ( 0 ) {}
    TKDNodePriority ( KDNode <Point> *node_, typename Point::Type priority_ ) : node ( node_ ), priority ( priority_ ) {}

    bool operator < ( const TKDNodePriority <Point> &n1 ) const
    {
            return priority > n1.priority;
    }
};

template <typename Point>
struct TNNeighboursList
{
    typedef std::multiset < TKDNodePriority <Point> > Type;
};

方法：

template <typename Point>
template <typename Point2>
void KDTree2D <Point>::findAllKNN2 ( const Point2 * point, typename TNNeighboursList <Point>::Type & knn, unsigned int k, KDNode <Point> *node, const unsigned int depth ) const
{
    if ( node == NULL )
    {
            return;
    }

    if ( point->getCoordinate ( depth % 2 ) <= node->getData()->getCoordinate ( depth % 2 ) )
    {
    findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );
    }

    else 
    {
            findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );
    }

typename Point::Type dist_q_node = ( node->getData()->getX() - point->getX() ) * ( node->getData()->getX() - point->getX() ) +
                             ( node->getData()->getY() - point->getY() ) * ( node->getData()->getY() - point->getY() );

if (knn.size() == k)
{
    if (dist_q_node < knn.begin()->priority )
    {
        knn.erase(knn.begin());
        knn.insert ( TKDNodePriority <Point> ( node,  dist_q_node ) );
    }
}

else
{
    knn.insert ( TKDNodePriority <Point> ( node,  dist_q_node ) );
}

typename Point::Type dist_q_node_straight = ( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) *
                                                ( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) ;

typename Point::Type top_priority =  knn.begin()->priority;
if ( knn.size() < k ||  dist_q_node_straight <  top_priority )
{
            if ( point->getCoordinate ( node->getDepth() % 2 ) < node->getData()->getCoordinate ( node->getDepth() % 2 ) )
            {
        findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );
    }

    else
    {
        findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );
    }
}
}

PQ实现方式（为什么较慢？）

template <typename Point>
struct TKDNodePriority
{
    KDNode <Point> *node;
    typename Point::Type priority;

    TKDNodePriority() : node ( NULL ), priority ( 0 ) {}
    TKDNodePriority ( KDNode <Point> *node_, typename Point::Type priority_ ) : node ( node_ ), priority ( priority_ ) {}

    bool operator < ( const TKDNodePriority <Point> &n1 ) const
    {
            return priority > n1.priority;
    }
};


template <typename Point>
struct TNNeighboursList
{ 
    typedef std::priority_queue< TKDNodePriority <Point> > Type;
};

方法：

template <typename Point>
template <typename Point2>
void KDTree2D <Point>::findAllKNN2 ( const Point2 * point, typename TNNeighboursList <Point>::Type & knn, unsigned int k, KDNode <Point> *node, const unsigned int depth ) const
{

    if ( node == NULL )
    {
            return;
    }

    if ( point->getCoordinate ( depth % 2 ) <= node->getData()->getCoordinate ( depth % 2 ) )
    {
    findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );
    }

    else 
    {
            findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );
    }

typename Point::Type dist_q_node = ( node->getData()->getX() - point->getX() ) * ( node->getData()->getX() - point->getX() ) +
                             ( node->getData()->getY() - point->getY() ) * ( node->getData()->getY() - point->getY() );

if (knn.size() == k)
{
    if (dist_q_node < knn.top().priority )
    {
        knn.pop();

        knn.push ( TKDNodePriority <Point> ( node,  dist_q_node ) );
    }
}

else
{
    knn.push ( TKDNodePriority <Point> ( node,  dist_q_node ) );
}

typename Point::Type dist_q_node_straight = ( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) *
                                                ( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) ;

typename Point::Type top_priority =  knn.top().priority;
if ( knn.size() < k ||  dist_q_node_straight <  top_priority )
{
            if ( point->getCoordinate ( node->getDepth() % 2 ) < node->getData()->getCoordinate ( node->getDepth() % 2 ) )
            {
        findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );
    }

    else
    {
        findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );
    }
}
}

- Johny

1

这是一次性的观察还是持续性的观察？ - DumbCoder

1

@unapersson：好的，multiset 作为一种搜索树，可能可以用作优先队列。我只想知道调用了哪些成员函数，并调用了多少次。 - Fred Foo

3

标准并没有规定有序容器（例如multiset和其他相关容器）的具体实现方式，尽管其复杂度要求强制它们使用平衡树。标准确立了priority_queue必须使用标准堆算法的规定（由于底层容器是受保护成员，使用其他算法将破坏队列的可见行为）。 - Mike Seymour

@larsman 我的意思是 - 几乎任何容器都可以被构想成为一个优先队列。 - user2100815

1

你是如何进行测量的？请使用Google Benchmark！有一个在线网站提供此工具：http://quick-bench.com/ - Marek R

显示剩余5条评论

4个回答

8

看起来编译器的优化设置对性能可能有很大影响。在下面的代码中，multiset比基于vector和deque的priority_queue更快。然而，使用“-O3”优化后，基于vector的priority_queue胜过所有其他方法。这些实验是在Linux上使用GCC运行的，因此在Windows上可能会得到不同的结果。我认为启用优化可能会消除STL的vector中许多错误检查行为。

没有优化： pq-w-vector: 79.2997毫秒 pq-w-deque: 362.366毫秒 pq-w-multiset: 34.649毫秒

使用-O2优化： pq-w-vector: 8.88154毫秒 pq-w-deque: 17.5233毫秒 pq-w-multiset: 12.5539毫秒

使用-O3优化： pq-w-vector: 7.92462毫秒 pq-w-deque: 16.8028毫秒 pq-w-multiset: 12.3208毫秒

测试工具（不要忘记链接-lrt）：

#include <iostream>
#include <queue>
#include <deque>
#include <set>
#include <ctime>
#include <cstdlib>
#include <unistd.h>

using namespace std;

template <typename T>
double run_test(T& pq, int size, int iterations)
{
    struct timespec start, end;

    for(int i = 0; i < size; ++i)
        pq.push(rand());

    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start);
    for(int i = 0; i < iterations; ++i)
    {
        if(rand()%2)
            pq.pop();
        else
            pq.push(rand());

    }
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end);

    end.tv_sec -= start.tv_sec;
    end.tv_nsec -= start.tv_nsec;
    if (end.tv_nsec < 0)
    {
        --end.tv_sec;
        end.tv_nsec += 1000000000ULL;
    }

    return (end.tv_sec*1e3 + end.tv_nsec/1e6);
}

template <class T>
class multiset_pq: public multiset<T>
{
public:
    multiset_pq(): multiset<T>() {};
    void push(T elm) { this->insert(elm); }
    void pop() { if(!this->empty()) this->erase(this->begin()); }
    const T& top() { return *this->begin(); }
};

int main(void)
{
    const int size = 5000;
    const int iterations = 100000;

    priority_queue<int, vector<int> > pqv;
    priority_queue<int, deque<int> > pqd;
    multiset_pq<int> pqms;

    srand(time(0));

    cout<<"pq-w-vector: "<<run_test(pqv, size, iterations)<<"ms"<<endl;
    cout<<"pq-w-deque: "<<run_test(pqd, size, iterations)<<"ms"<<endl;
    cout<<"pq-w-multiset: "<<run_test(pqms, size, iterations)<<"ms"<<endl;

    return 0;
}

- Glenn

请查看Matt Godbolt的编译器浏览器：https://godbolt.org/。您可以查看编译器的反汇编，并比较在不同编译器优化时产生的结果。 - stackunderflow

插入rand()实际上并不代表真实情况，特别是基于作者的代码示例。此外，默认情况下，priority_queue是基于vector的。 - Liastre

3

根据我理解，priority_queue的实现是罪魁祸首。在底层实现上，priority_queue被实现为一个专门的vector或deque，因为它需要具有随机访问迭代器。当您将项目推入priority_queue中或弹出项目时，队列中剩余的项目需要复制到空间中，插入时也会发生同样的情况。multi_set基于键值存储元素。

编辑：非常抱歉，multi_set不是基于哈希键的。由于某种原因，我将其与multi_map混淆了。但是multi_set是一种多重排序关联容器，根据键（与值相同）存储元素。由于multi_set中元素的存储方式，插入新元素不会使指向现有元素的迭代器失效。从multi_set中删除元素也不会使任何迭代器失效，除了指向正在被删除的元素的迭代器。

-- 引用自SGI文档。

这意味着multi_set的存储不是线性的，因此可以提高性能。

- Vite Falcon

4

“multisets” 不基于哈希表，而 “unordered_multisets” 则是。 - user2100815

无论是插入还是删除，时间复杂度都为O(lg n)，而multiset不是哈希表。-1。 - Fred Foo

2

@BradTilley：你知道这是在C++11被批准之前写的吗？如果那是你主要的动机，你能否请取消你的反对票？我已经从我的错误中吸取了教训 :) 这个答案是在2011年5月发布的，而C++11是在2011年8月批准的。谢谢。 - Vite Falcon

0

这个非合成基准测试是从使用priority_queue的实际情况中获取的。将this file作为标准输入来运行基准测试。

// TOPOSORT 2
// This short function computes the lexicographically smallest toposort.
// priority_queue vs multiset benchmark
#include <vector>
#include <queue>
#include <set>
#include <unordered_set>
#include <ctime>
#include <chrono>
#include <iostream>
// https://dev59.com/eWYr5IYBdhLWcg3wTYq5#13772771
#ifdef _WIN32
#include <intrin.h>
#else
#include <x86intrin.h>
#endif
using namespace std;
constexpr int MAXN = 100001;
struct Tail {
  int observation, number;
};
typedef vector<vector<Tail>> AdjList;

int N, M;
void computeNumIncomingEdges(int observationID, AdjList adjacency_list,
                             int *numIncomingEdges) {
  for (int node = 0; node <= N; ++node) {
    numIncomingEdges[node] = 0;
  }
  for (int node = 1; node <= N; ++node) {
    for (Tail tail : adjacency_list[node]) {
      if (tail.observation <= observationID) {
        numIncomingEdges[tail.number]++;
      }
    }
  }
}
template<class T>
vector<int> toposort2_PQ(int observationID, AdjList adjacency_list) {
  vector<int> sortedElements;
  priority_queue<int, T, std::greater<int>> S;
  static int numIncomingEdges[MAXN];
  computeNumIncomingEdges(observationID, adjacency_list, numIncomingEdges);
  for (int node = 1; node <= N; ++node) {
    if (numIncomingEdges[node] == 0)
      S.push(node);
  }
  while (!S.empty()) {
    auto n = S.top();
    S.pop();
    sortedElements.push_back(n);
    for (int _ = adjacency_list[n].size() - 1; _ >= 0; --_) {
      Tail m = adjacency_list[n][_];
      if (m.observation <= observationID) {
        adjacency_list[n].pop_back();
        numIncomingEdges[m.number]--;
        if (numIncomingEdges[m.number] == 0)
          S.push(m.number);
      }
    }
  }
  bool graphStillHasEdges = false;
  for (int node = 1; node <= N; ++node)
    if (numIncomingEdges[node] > 0) {
      graphStillHasEdges = true;
      break;
    }
  return sortedElements;
}

vector<int> toposort2_multiset(int observationID, AdjList adjacency_list) {
  vector<int> sortedElements;
  multiset<int, std::greater<int>> S;
  static int numIncomingEdges[MAXN];
  computeNumIncomingEdges(observationID, adjacency_list, numIncomingEdges);
  for (int node = 1; node <= N; ++node) {
    if (numIncomingEdges[node] == 0)
      S.insert(node);
  }
  while (!S.empty()) {
    int n = *S.begin();
    S.erase(S.begin());
    sortedElements.push_back(n);
    for (int _ = adjacency_list[n].size() - 1; _ >= 0; --_) {
      Tail m = adjacency_list[n][_];
      if (m.observation <= observationID) {
        adjacency_list[n].pop_back();
        numIncomingEdges[m.number]--;
        if (numIncomingEdges[m.number] == 0)
          S.insert(m.number);
      }
    }
  }
  bool graphStillHasEdges = false;
  for (int node = 1; node <= N; ++node)
    if (numIncomingEdges[node] > 0) {
      graphStillHasEdges = true;
      break;
    }
  return sortedElements;
}

int main() {
  scanf("%d %d", &N, &M);
  AdjList adjacency_list(MAXN);
  for (int observation = 0; observation < M; ++observation) {
    int observationSize;
    scanf("%d", &observationSize);
    int head;
    scanf("%d", &head);
    for (int i = 0; i < observationSize - 1; ++i) {
      int tail;
      scanf("%d", &tail);
      Tail to_insert;
      to_insert.observation = observation;
      to_insert.number = tail;
      adjacency_list[head].push_back(to_insert);
      head = tail;
    }
  }
  for (int i = 0; i < 5; ++i) {
  auto start_pq = std::chrono::high_resolution_clock::now();
  toposort2_PQ<vector<int>>(3182, adjacency_list);
  auto end_pq =  std::chrono::high_resolution_clock::now();
  auto start_pq_dq = std::chrono::high_resolution_clock::now();
  toposort2_PQ<deque<int>>(3182, adjacency_list);
  auto end_pq_dq =  std::chrono::high_resolution_clock::now();
  auto start_ms =  std::chrono::high_resolution_clock::now();
  toposort2_multiset(3182, adjacency_list);
  auto end_ms =  std::chrono::high_resolution_clock::now();
  std::cout << std::chrono::duration_cast<std::chrono::microseconds>(end_pq-start_pq).count() << ' ' << std::chrono::duration_cast<std::chrono::microseconds>(end_pq_dq-start_pq_dq).count() << ' ' << std::chrono::duration_cast<std::chrono::microseconds>(end_ms-start_ms).count() << endl;
  }
}

使用带有 -O2 的 clang++ 能够给我带来以下效果：

31622 37891 54884
27092 33919 54878
27324 35870 51427
27961 35348 53170
26746 34753 54191

总的来说，使用vector实现的priority_queue表现最佳。其次是deque实现的priority_queue，而multiset则表现最差。

- noɥʇʎԀʎzɐɹƆ

当使用的类型是较大的类型而不是int时，会发生什么情况？原始问题使用具有两个字段（指针和某种数字（int/float/double））的结构体。由于要复制的数据大小更大，这可能会增加PQ的执行时间。 - 1201ProgramAlarm

你的帖子实际上没有回答主题的问题，为什么一个数据结构比另一个更快？你只是在说：“看，它在我的环境中使用这个特定的数据和算法很快”。 - Liastre

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Liastre · Accepted Answer

首先，作者没有提供导致性能下降的代码最小示例。其次，这个问题是8年前问的，我相信编译器在性能方面取得了巨大的提升。

我制作了一个基准测试示例，在循环中使用数组元素计数kNodesCount和kRunsCount迭代，从队列中取出第一个元素，然后使用另一个优先级将其推回去（模拟推入新元素而不创建元素），并将priority_queue与multiset和multimap进行比较。我决定包括multimap以进行更精确的比较。它的简单测试非常接近作者的用例，我还尝试重现他在代码示例中使用的结构体。

#include <set>
#include <type_traits>
#include <vector>
#include <chrono>
#include <queue>
#include <map>
#include <iostream>

template<typename T>
struct Point {
    static_assert(std::is_integral<T>::value || std::is_floating_point<T>::value, "Incompatible type");
    using Type = T;

    T x;
    T y;
};

template<typename T>
struct Node {
    using Type = T;

    Node<T> * left;
    Node<T> * right;
    T data;
};

template <typename T>
struct NodePriority {
    using Type = T;
    using DataType = typename T::Type;

    Node<T> * node = nullptr;
    DataType priority = static_cast<DataType>(0);

    bool operator < (const NodePriority<T> & n1) const noexcept {
        return priority > n1.priority;
    }

    bool operator > (const NodePriority<T> & n1) const noexcept {
        return priority < n1.priority;
    }
};

// descending order by default
template <typename T>
using PriorityQueueList = std::priority_queue<T>;

// greater used because of ascending order by default
template <typename T>
using MultisetList = std::multiset<T, std::greater<T>>;

// greater used because of ascending order by default
template <typename T>
using MultimapList = std::multimap<typename T::DataType, T, std::greater<typename T::DataType>>;

struct Inner {
    template<template <typename> class C, typename T>
    static void Operate(C<T> & list, std::size_t priority);

    template<typename T>
    static void Operate(PriorityQueueList<T> & list, std::size_t priority) {
        if (list.size() % 2 == 0) {
            auto el = std::move(list.top());
            el.priority = priority;
            list.push(std::move(el));
        }
        else {
            list.pop();
        }
    }

    template<typename T>
    static void Operate(MultisetList<T> & list, std::size_t priority) {
        if (list.size() % 2 == 0) {
            auto el = std::move(*list.begin());
            el.priority = priority;
            list.insert(std::move(el));
        }
        else {
            list.erase(list.begin());
        }
    }

    template<typename T>
    static void Operate(MultimapList<T> & list, std::size_t priority) {
        if (list.size() % 2 == 0) {
            auto el = std::move(*list.begin());
            auto & elFirst = const_cast<int&>(el.first);
            elFirst = priority;
            el.second.priority = priority;
            list.insert(std::move(el));
        }
        else {
            list.erase(list.begin());
        }
    }
};

template<typename T>
void doOperationOnPriorityList(T & list) {
    for (std::size_t pos = 0, len = list.size(); pos < len; ++pos) {
        // move top element and update priority
        auto priority = std::rand() % 10;
        Inner::Operate(list, priority);
    }
}

template<typename T>
void measureOperationTime(T & list, std::size_t runsCount) {
    std::chrono::system_clock::time_point t1, t2;
    std::uint64_t totalTime(0);
    for (std::size_t i = 0; i < runsCount; ++i) {
        t1 = std::chrono::system_clock::now();
        doOperationOnPriorityList(list);
        t2 = std::chrono::system_clock::now();
        auto castedTime = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
        std::cout << "Run " << i << " time: " << castedTime << "\n";
        totalTime += castedTime;
    }

    std::cout << "Average time is: " << totalTime / runsCount << " ms" << std::endl;
}

int main() {
    // consts
    const int kNodesCount = 10'000'000;
    const int kRunsCount = 10;

    // prepare data
    PriorityQueueList<NodePriority<Point<int>>> neighboursList1;
    MultisetList<NodePriority<Point<int>>> neighboursList2;
    MultimapList<NodePriority<Point<int>>> neighboursList3;
    std::vector<Node<Point<int>>> nodes;
    nodes.reserve(kNodesCount);
    for (auto i = 0; i < kNodesCount; ++i) {
        nodes.emplace_back(decltype(nodes)::value_type{ nullptr, nullptr, { 0,0 } });
        auto priority = std::rand() % 10;
        neighboursList1.emplace(decltype(neighboursList1)::value_type{ &nodes.back(), priority });
        neighboursList2.emplace(decltype(neighboursList2)::value_type{ &nodes.back(), priority });
        neighboursList3.emplace(decltype(neighboursList3)::value_type{ priority, { &nodes.back(), priority } });
    }

    // do operation on data
    std::cout << "\nPriority queue\n";
    measureOperationTime(neighboursList1, kRunsCount);
    std::cout << "\nMultiset\n";
    measureOperationTime(neighboursList2, kRunsCount);
    std::cout << "\nMultimap\n";
    measureOperationTime(neighboursList3, kRunsCount);

    return 0;
}

我已经使用VS v15.8.9进行了/Ox的发布构建。查看10次运行中10,000,000个项目的结果：

Priority queue
Run 0 time: 764
Run 1 time: 933
Run 2 time: 920
Run 3 time: 813
Run 4 time: 991
Run 5 time: 862
Run 6 time: 902
Run 7 time: 1277
Run 8 time: 774
Run 9 time: 771
Average time is: 900 ms

Multiset
Run 0 time: 2235
Run 1 time: 1811
Run 2 time: 1755
Run 3 time: 1535
Run 4 time: 1475
Run 5 time: 1388
Run 6 time: 1482
Run 7 time: 1431
Run 8 time: 1347
Run 9 time: 1347
Average time is: 1580 ms

Multimap
Run 0 time: 2197
Run 1 time: 1885
Run 2 time: 1725
Run 3 time: 1671
Run 4 time: 1500
Run 5 time: 1403
Run 6 time: 1411
Run 7 time: 1420
Run 8 time: 1409
Run 9 time: 1362
Average time is: 1598 ms

嗯嗯，你看multiset和multimap的性能是一样的，而priority_queue是最快的（大约快43%）。那么为什么会这样呢？

让我们从priority_queue开始，C++标准并没有告诉我们如何实现一个容器或结构，但在大多数情况下，它基于二叉堆来实现（查找msvc和gcc的实现）！对于priority_queue，除了顶部之外，您无法访问任何元素，不能通过索引迭代，甚至不能获取最后一个元素（这使得一些空间可以优化）。二叉堆的平均插入是O（1），只有最坏情况是O（log n），而删除是O（log n），因为我们从底部取出元素，然后搜索下一个高优先级。

那么multimap和multiset呢？它们通常都是基于红黑树实现的（查找msvc和gcc的实现），其中平均插入是O（log n），删除也是O（log n）。

从这个角度来看，priority_queue 绝不会比 multiset 或 multimap 慢。所以，回到你的问题，作为优先队列，multiset 不比 priority_queue 本身更快。可能有很多原因，包括旧编译器上的原始 priority_queue 实现或对该结构的错误使用（问题中没有包含最小可行示例），除此之外，作者没有提到编译标志或编译器版本，有时优化会带来显著的变化。 更新1 根据 @noɥʇʎԀʎzɐɹƆ 的要求

不幸的是，我现在无法访问 Linux 环境，但我已经安装了 mingw-w64，版本信息：g++.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0。使用的处理器与 Visual Studio 相同：Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz，2001 Mhz，4 核心，8 逻辑处理器。

因此，g++ -O2 的结果是：

Priority queue
Run 0 time: 775
Run 1 time: 995
Run 2 time: 901
Run 3 time: 807
Run 4 time: 930
Run 5 time: 765
Run 6 time: 799
Run 7 time: 1151
Run 8 time: 760
Run 9 time: 780
Average time is: 866 ms

Multiset
Run 0 time: 2280
Run 1 time: 1942
Run 2 time: 1607
Run 3 time: 1344
Run 4 time: 1319
Run 5 time: 1210
Run 6 time: 1129
Run 7 time: 1156
Run 8 time: 1244
Run 9 time: 992
Average time is: 1422 ms

Multimap
Run 0 time: 2530
Run 1 time: 1958
Run 2 time: 1670
Run 3 time: 1390
Run 4 time: 1391
Run 5 time: 1235
Run 6 time: 1088
Run 7 time: 1198
Run 8 time: 1071
Run 9 time: 963
Average time is: 1449 ms

你可能会注意到，这张图片与msvc的几乎相同。

更新2 感谢@JorgeBellon

一个quick-bench.com的在线基准测试链接，请自行检查！

希望看到我文章中的任何补充，干杯！