Boost 线程开销

Question

Boost 线程开销

c++multithreadingperformancethreadpoolboost-thread

3

我发现在以下简单程序中，boost线程的开销有三个数量级的时间开销。是否有任何方法可以减少这种开销并加速fooThread()的调用？

#include <iostream>
#include <time.h>
#include <boost/thread.hpp>
#include <boost/date_time.hpp>
typedef uint64_t tick_t;
#define rdtscll(val) do { \
    unsigned int __a,__d; \
    __asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
        (val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \
    } while(0)


class baseClass {
 public:
   void foo(){
             //Do nothing 
        }
       void threadFoo(){
          threadObjOne = boost::thread(&baseClass::foo, this);
              threadObjOne.join();
   }

 private:
   boost::thread threadObjOne;
 };

int main(){
   std::cout<< "main startup"<<std::endl; 
   baseClass baseObj; 
   tick_t startTime,endTime;
       rdtscll(startTime);
   baseObj.foo();
   rdtscll(endTime);
   std::cout<<"native foo() call takes "<< endTime-startTime <<" clock cycles"<<std::endl;
   rdtscll(startTime);
   baseObj.threadFoo();
       rdtscll(endTime);
       std::cout<<"Thread foo() call takes "<< endTime-startTime <<" clock cycles"<<std::endl;  
  }

你可以使用 g++ -lboost_thread-mt main.cpp 编译它，这是我的机器上的示例输出：

main startup
native foo() call takes 2187 clock cycles
Thread foo() call takes 29630434 clock cycles

- ARH

1

先调用boost线程，再调用本地线程，并告诉我们结果。看起来就像你进行了一次上下文切换，大约需要3000万个周期在3 GHz的频率下，这大约是10毫秒 - 一个很好的时间粒度相似性。 - Lyth

性能下降了4个数量级: 主启动线程foo()的调用需要13418779个时钟周期本地foo()的调用需要2197个时钟周期 - ARH

2

我们知道启动线程是有成本的。您需要向操作系统请求堆栈空间设置整个一套其他东西。正如Lyth所说，10毫秒并不那么糟糕。因此，数量级并不相关，因为您不会每次想要并行运行函数时都创建一个线程。加快速度的方法是创建一个线程，然后调用foo()十亿次，并将其与正常调用它十亿次的成本进行比较。线程设置的成本就变得微不足道了。 - Martin York

我理解你的观点Loki。然而，这种微不足道的线程开销会破坏应用程序的并行处理，因为在我的情况下，每个并行函数体都非常简单。因此，按顺序调用函数胜过多线程实现。如果没有其他方法来减少这种开销，我认为我应该放弃多线程。 - ARH

在这种情况下，可以查看线程池，这样可以将创建开销减少到只有一个调用。 - KillianDS

3

不正确地使用线程通常会使应用程序变慢而不是加快速度，但这并不妨碍我们试图让您正确使用它并获得速度提升。重点是不要为每个函数启动新线程。您只想启动少量的线程（确切数量因情况而异，但(1.5->2)*<cpu count>是一个很好的起点）。然后将您的函数分配到每个线程上。这样，每个线程将按顺序执行一组函数。通过使用线程池（如@KillianDS所建议的），您可以动态分配负载。 - Martin York

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Martin York · Accepted Answer

您真正需要的是线程池：

#include "threadpool.hpp"

int main()
{
    boost::threadpool::pool threadpool(8);  // I have 4 cpu's
                                            // Might be overkill need to time
                                            // to get exact numbers but depends on
                                            // blocking and other factors.

    for(int loop = 0;loop < 100000; ++loop)
    {
        // schedule 100,000 small tasks to be run in the 8 threads in the pool
        threadpool.schedule(task(loop));
    }

    // Destructor of threadpool
    // will force the main thread to wait
    // for all tasks to complete before exiting
}