在C++中测量函数的执行时间

Question

在C++中测量函数的执行时间

235

我想要找出在Linux上执行某个函数所需的时间。之后，我想进行速度比较。我看过几个计时函数，但最终选择了Boost库中的Chrono。

process_user_cpu_clock, captures user-CPU time spent by the current process

现在，我不确定如果我使用上面的函数，我是否只会得到CPU在该函数上花费的时间？

其次，我找不到使用上述功能的任何示例。有人可以帮忙告诉我如何使用上述函数吗？

P.S：目前，我正在使用std :: chrono :: system_clock :: now()以秒为单位获取时间，但由于每次CPU负载不同，这会给我不同的结果。

- Xara

2

对于Linux使用：clock_gettime。gcc定义其他时钟为：typedef system_clock steady_clock; typedef system_clock high_resolution_clock; 在Windows上，使用QueryPerformanceCounter。 - Brandon

这个问题不是重复的吗？与此问题相同，还是场景使解决方案不同？ - northerner

我有两个函数实现，想要找出哪一个执行更好。 - northerner

2

非常重要：确保启用优化。未经优化的代码与正常优化的代码有不同的瓶颈，并且无法提供有意义的任何信息。C循环优化帮助（关闭编译器优化）。并且通常微基准测试有许多陷阱，特别是在CPU频率和页面故障方面首先未进行热身循环时可能会失败：性能评估的惯用方法？。还有此答案。 - Peter Cordes

1

请参阅如何基准测试函数的性能？，了解Google Benchmark，它避免了自己编写微型基准测试时的许多陷阱。此外，还可以参考简单的for()循环基准测试对于任何循环绑定都需要相同的时间，以获取有关优化与基准测试循环交互以及如何处理它的更多信息。 - Peter Cordes

14个回答

33

这是一个函数，可以测量传递为参数的任何函数的执行时间：

#include <chrono>
#include <utility>

typedef std::chrono::high_resolution_clock::time_point TimeVar;

#define duration(a) std::chrono::duration_cast<std::chrono::nanoseconds>(a).count()
#define timeNow() std::chrono::high_resolution_clock::now()

template<typename F, typename... Args>
double funcTime(F func, Args&&... args){
    TimeVar t1=timeNow();
    func(std::forward<Args>(args)...);
    return duration(timeNow()-t1);
}

使用示例：

#include <iostream>
#include <algorithm>

typedef std::string String;

//first test function doing something
int countCharInString(String s, char delim){
    int count=0;
    String::size_type pos = s.find_first_of(delim);
    while ((pos = s.find_first_of(delim, pos)) != String::npos){
        count++;pos++;
    }
    return count;
}

//second test function doing the same thing in different way
int countWithAlgorithm(String s, char delim){
    return std::count(s.begin(),s.end(),delim);
}


int main(){
    std::cout<<"norm: "<<funcTime(countCharInString,"precision=10",'=')<<"\n";
    std::cout<<"algo: "<<funcTime(countWithAlgorithm,"precision=10",'=');
    return 0;
}

输出：

norm: 15555
algo: 2976

- Jahid

4

"high_resolution_clock"的具体实现可能是"system_clock"（墙上钟表），"steady_clock"或第三个独立的时钟，详细信息请参见此处。对于CPU时钟，可以使用"std::clock"。 - Jahid

2

两个宏和一个全局typedef，没有一个可以节省一次按键，这绝对不是我所谓的优雅。此外，单独传递函数对象和完美转发参数有点过度设计（在重载函数的情况下甚至不方便），当你可以要求定时代码放在lambda中时。但是，只要传递参数是可选的，那就好。 - MikeMB

2

这是违反有关宏命名的每个准则的正当理由吗？您不会给它们加前缀，也不使用大写字母，选择一个非常常见的名称，很可能与某些本地符号冲突，最重要的是：为什么要使用宏（而不是函数）？顺便说一句：为什么首先要将持续时间作为表示纳秒的双精度返回？我们可能应该达成一致，即我们不同意。我的原始观点仍然存在：“这不是我所谓的优雅代码”。 - MikeMB

4

@MikeMB: 很好的观点，把这个作为标题肯定是不好的。但最终，这只是一个例子，如果你有复杂的需求，你得考虑标准实践并相应地调整代码。例如，在编写代码时，我会让它方便我的工作，当它需要移到其他地方时，我会采取一切必要的步骤使其更加健壮，这样我就不必再去看它了。我认为，每个不完全是新手程序员在适当的时候都会从更广泛的角度思考问题。希望我澄清了我的观点:D。 - Jahid

2

@Jahid：谢谢。在这种情况下，请将我的注释视为无效和空值。 - MikeMB

显示剩余11条评论

14

我在Scott Meyers的书中发现了一个通用泛型lambda表达式的例子，可以用来测量函数执行时间。(C++14)

auto timeFuncInvocation = 
    [](auto&& func, auto&&... params) {
        // get time before function invocation
        const auto& start = std::chrono::high_resolution_clock::now();
        // function invocation using perfect forwarding
        std::forward<decltype(func)>(func)(std::forward<decltype(params)>(params)...);
        // get time after function invocation
        const auto& stop = std::chrono::high_resolution_clock::now();
        return stop - start;
     };

问题在于只测量了一次执行，因此结果可能会有很大的差异。为了获得可靠的结果，您应该测量大量的执行。

根据Andrei Alexandrescu在code::dive 2015会议上的演讲 - Writing Fast Code I中所说：

测量时间：tm = t + tq + tn + to

其中：

tm-测量（观察）时间

t-感兴趣的实际时间

tq-由量化噪声添加的时间

tn-由各种噪声源添加的时间

to-开销时间（测量、循环、调用函数）

根据他在演讲后面所说的，你应该将这个大量的执行中的最小值作为你的结果。我鼓励您查看他解释原因的演讲。

此外，谷歌还有一个非常好的库-https://github.com/google/benchmark。这个库非常简单易用且功能强大。您可以在YouTube上查看Chandler Carruth的一些讲座，他在实践中使用了这个库。例如CppCon 2017: Chandler Carruth “Going Nowhere Faster”;

示例用法：

#include <iostream>
#include <chrono>
#include <vector>
auto timeFuncInvocation = 
    [](auto&& func, auto&&... params) {
        // get time before function invocation
        const auto& start = high_resolution_clock::now();
        // function invocation using perfect forwarding
        for(auto i = 0; i < 100000/*largeNumber*/; ++i) {
            std::forward<decltype(func)>(func)(std::forward<decltype(params)>(params)...);
        }
        // get time after function invocation
        const auto& stop = high_resolution_clock::now();
        return (stop - start)/100000/*largeNumber*/;
     };

void f(std::vector<int>& vec) {
    vec.push_back(1);
}

void f2(std::vector<int>& vec) {
    vec.emplace_back(1);
}
int main()
{
    std::vector<int> vec;
    std::vector<int> vec2;
    std::cout << timeFuncInvocation(f, vec).count() << std::endl;
    std::cout << timeFuncInvocation(f2, vec2).count() << std::endl;
    std::vector<int> vec3;
    vec3.reserve(100000);
    std::vector<int> vec4;
    vec4.reserve(100000);
    std::cout << timeFuncInvocation(f, vec3).count() << std::endl;
    std::cout << timeFuncInvocation(f2, vec4).count() << std::endl;
    return 0;
}

编辑：当然，您始终需要记住编译器可能会优化掉某些内容。在这种情况下，像perf这样的工具可能会有用。

- Krzysztof Sommerfeld

有趣——在这里使用lambda相比函数模板有什么好处？ - user48956

1

主要的区别在于它是一个可调用对象，但实际上你可以通过可变参数模板和std::result_of_t获得非常相似的结果。 - Krzysztof Sommerfeld

@KrzysztofSommerfeld 如何在函数方法中实现这个，当我传递 timing(Object.Method1) 时，它会返回错误 "非标准语法；使用 '&' 创建成员指针"。 - RobinAtTech

timeFuncInvocation([&objectName](auto&&... args){ objectName.methodName(std::forward<decltype(args)>(args)...); }, arg1, arg2,...); 或者省略对象名称前的 & 符号（这样你将得到一个对象的副本）。 - Krzysztof Sommerfeld

12

一个简单的程序，用于查找函数执行所需的时间。

#include <iostream>
#include <ctime> // time_t
#include <cstdio>

void function()
{
     for(long int i=0;i<1000000000;i++)
     {
        // do nothing
     }
}

int main()
{

time_t begin,end; // time_t is a datatype to store time values.

time (&begin); // note time before execution
function();
time (&end); // note time after execution

double difference = difftime (end,begin);
printf ("time taken for function() %.2lf seconds.\n", difference );

return 0;
}

- Abdullah Farweez

9

很不准确，只显示秒，没有毫秒。 - user25

你应该使用类似于clock_gettime的方法，并在struct timespec中处理结果，但这是C的解决方案而不是C ++的。 - Victor

8

老版本的C++或C语言的简单方法：

#include <time.h> // includes clock_t and CLOCKS_PER_SEC

int main() {

    clock_t start, end;

    start = clock();
    // ...code to measure...
    end = clock();

    double duration_sec = double(end-start)/CLOCKS_PER_SEC;
    return 0;
}

秒级的时间精度为1.0/CLOCKS_PER_SEC

- v.chaplin

3

这不是可移植的。它在Linux上测量处理器时间，在Windows上测量时钟时间。 - BugSquasher

开始和结束时间始终相同，尽管我添加了一个包含512个元素的数组.....在Win64/Visual Studio 17下。 - Maverick

我不确定是什么原因导致了这个问题，但如果你正在使用C++，最好切换到标准的<chrono>方法。 - v.chaplin

6

#include <iostream>
#include <chrono>

void function()
{
    // code here;
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    function();
    auto t2 = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

    std::cout << duration<<"/n";
    return 0;
}

这对我有用。

注意：

high_resolution_clock 在不同标准库实现中的实现不一致，应避免使用。它通常只是 std::chrono::steady_clock 或 std::chrono::system_clock 的别名，但其是哪一个取决于库或配置。当它是一个 system_clock 时，它不是单调的（例如，时间可能会倒流）。

例如，对于gcc的 libstdc++，它是一个 system_clock；对于MSVC，它是一个 steady_clock；对于clang的 libc++，它取决于配置。

通常应直接使用 std::chrono::steady_clock 或 std::chrono::system_clock 而不是 std::chrono::high_resolution_clock：对于持续时间测量，请使用 steady_clock，对于挂钟时间，请使用 system_clock。

- Ranjeet R Patil

4

这是一个优秀的头文件类模板，用于测量函数或任何代码块的经过时间:

#ifndef EXECUTION_TIMER_H
#define EXECUTION_TIMER_H

template<class Resolution = std::chrono::milliseconds>
class ExecutionTimer {
public:
    using Clock = std::conditional_t<std::chrono::high_resolution_clock::is_steady,
                                     std::chrono::high_resolution_clock,
                                     std::chrono::steady_clock>;
private:
    const Clock::time_point mStart = Clock::now();

public:
    ExecutionTimer() = default;
    ~ExecutionTimer() {
        const auto end = Clock::now();
        std::ostringstream strStream;
        strStream << "Destructor Elapsed: "
                  << std::chrono::duration_cast<Resolution>( end - mStart ).count()
                  << std::endl;
        std::cout << strStream.str() << std::endl;
    }    

    inline void stop() {
        const auto end = Clock::now();
        std::ostringstream strStream;
        strStream << "Stop Elapsed: "
                  << std::chrono::duration_cast<Resolution>(end - mStart).count()
                  << std::endl;
        std::cout << strStream.str() << std::endl;
    }

}; // ExecutionTimer

#endif // EXECUTION_TIMER_H

以下是它的一些用途：

int main() {
    { // empty scope to display ExecutionTimer's destructor's message
         // displayed in milliseconds
         ExecutionTimer<std::chrono::milliseconds> timer;

         // function or code block here

         timer.stop();

    } 

    { // same as above
        ExecutionTimer<std::chrono::microseconds> timer;

        // code block here...

        timer.stop();
    }

    {  // same as above
       ExecutionTimer<std::chrono::nanoseconds> timer;

       // code block here...

       timer.stop();

    }

    {  // same as above
       ExecutionTimer<std::chrono::seconds> timer;

       // code block here...

       timer.stop();

    }              

    return 0;
}

由于这个类是一个模板，我们可以非常容易地指定我们想要测量和显示时间的方式。这是一个非常方便的实用类模板，可用于进行基准测试，并且非常易于使用。

- Francis Cugler

个人认为，stop() 成员函数并不是必需的，因为析构函数会自动停止计时器。 - Casey

1

@Casey 类的设计不一定需要停止函数，但是它有一个特定的原因。在开始你的“测试代码”之前创建对象时，默认构造函数会启动计时器。然后在你的“测试代码”之后，你明确地使用计时器对象并调用它的停止方法。当你想要“停止”计时器时，必须手动调用它。该类不接受任何参数。此外，如果你按照我展示的方式使用这个类，你会发现在调用obj.stop和它的“析构函数”之间有最小的时间间隔。 - Francis Cugler

@Casey...这也允许在同一作用域内拥有多个计时器对象，虽然实际上并不需要，但这是另一个可行的选择。 - Francis Cugler

这个例子无法以所呈现的形式进行编译。错误与“没有匹配的operator<<运算符”有关！ - Celdor

@Celdor 你是否需要适当的包含；例如<chrono>？ - Francis Cugler

这是一个缺失的包含文件，现在无法检查，但我认为它是<iomanip>。我试图添加ostream、iostream等，很难找到正确的一个！ - Celdor

3

如果你想节省时间和代码行数，你可以将函数执行时间的测量变为一行宏：

a）如上面已建议的一样，实现一个时间测量类（这里是我为Android实现的）：

class MeasureExecutionTime{
private:
    const std::chrono::steady_clock::time_point begin;
    const std::string caller;
public:
    MeasureExecutionTime(const std::string& caller):caller(caller),begin(std::chrono::steady_clock::now()){}
    ~MeasureExecutionTime(){
        const auto duration=std::chrono::steady_clock::now()-begin;
        LOGD("ExecutionTime")<<"For "<<caller<<" is "<<std::chrono::duration_cast<std::chrono::milliseconds>(duration).count()<<"ms";
    }
};

b) 添加一个便捷的宏来使用当前函数名作为标签（在这里使用宏很重要，否则__FUNCTION__将会被求值为MeasureExecutionTime而不是您想要测量的函数）。

#ifndef MEASURE_FUNCTION_EXECUTION_TIME
#define MEASURE_FUNCTION_EXECUTION_TIME const MeasureExecutionTime measureExecutionTime(__FUNCTION__);
#endif

c）将您的宏编写在要测量的函数开头。例如：

 void DecodeMJPEGtoANativeWindowBuffer(uvc_frame_t* frame_mjpeg,const ANativeWindow_Buffer& nativeWindowBuffer){
        MEASURE_FUNCTION_EXECUTION_TIME
        // Do some time-critical stuff 
}

这将导致以下输出：

ExecutionTime: For DecodeMJPEGtoANativeWindowBuffer is 54ms

请注意，这种方法（和其它所有建议的解决方案）会测量从调用函数到返回函数之间的时间，而不一定是CPU执行函数的时间。但是，如果你没有通过调用sleep()或类似的方式让调度程序有任何挂起正在运行的代码的机会，那就没有区别。

- Constantin Geier

2

C++11中有一种非常易于使用的方法。

我们可以使用头文件中的std::chrono::high_resolution_clock。

我们可以编写一个方法以更可读的形式打印方法的执行时间。

例如，要查找1到1亿之间的所有质数，大约需要1分钟40秒。因此，执行时间将被打印为：

Execution Time: 1 Minutes, 40 Seconds, 715 MicroSeconds, 715000 NanoSeconds

这是代码：

代码在这里：

#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

typedef high_resolution_clock Clock;
typedef Clock::time_point ClockTime;

void findPrime(long n, string file);
void printExecutionTime(ClockTime start_time, ClockTime end_time);

int main()
{
    long n = long(1E+8);  // N = 100 million

    ClockTime start_time = Clock::now();

    // Write all the prime numbers from 1 to N to the file "prime.txt"
    findPrime(n, "C:\\prime.txt"); 

    ClockTime end_time = Clock::now();

    printExecutionTime(start_time, end_time);
}

void printExecutionTime(ClockTime start_time, ClockTime end_time)
{
    auto execution_time_ns = duration_cast<nanoseconds>(end_time - start_time).count();
    auto execution_time_ms = duration_cast<microseconds>(end_time - start_time).count();
    auto execution_time_sec = duration_cast<seconds>(end_time - start_time).count();
    auto execution_time_min = duration_cast<minutes>(end_time - start_time).count();
    auto execution_time_hour = duration_cast<hours>(end_time - start_time).count();

    cout << "\nExecution Time: ";
    if(execution_time_hour > 0)
    cout << "" << execution_time_hour << " Hours, ";
    if(execution_time_min > 0)
    cout << "" << execution_time_min % 60 << " Minutes, ";
    if(execution_time_sec > 0)
    cout << "" << execution_time_sec % 60 << " Seconds, ";
    if(execution_time_ms > 0)
    cout << "" << execution_time_ms % long(1E+3) << " MicroSeconds, ";
    if(execution_time_ns > 0)
    cout << "" << execution_time_ns % long(1E+6) << " NanoSeconds, ";
}

- Pratik Patil

2

我建议使用steady_clock，它保证是单调的，而不像high_resolution_clock。将其作为首选时钟。"最初的回答"

#include <iostream>
#include <chrono>

using namespace std;

unsigned int stopwatch()
{
    static auto start_time = chrono::steady_clock::now();

    auto end_time = chrono::steady_clock::now();
    auto delta    = chrono::duration_cast<chrono::microseconds>(end_time - start_time);

    start_time = end_time;

    return delta.count();
}

int main() {
  stopwatch(); //Start stopwatch
  std::cout << "Hello World!\n";
  cout << stopwatch() << endl; //Time to execute last line
  for (int i=0; i<1000000; i++)
      string s = "ASDFAD";
  cout << stopwatch() << endl; //Time to execute for loop
}

输出：

Hello World!
62
163514

- Gillespie

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Victor · Accepted Answer

这是C++11中一个非常易于使用的方法。你需要从<chrono>头文件中使用std::chrono::high_resolution_clock。

使用方法如下：

#include <chrono>

/* Only needed for the sake of this example. */
#include <iostream>
#include <thread>
    
void long_operation()
{
    /* Simulating a long, heavy operation. */

    using namespace std::chrono_literals;
    std::this_thread::sleep_for(150ms);
}

int main()
{
    using std::chrono::high_resolution_clock;
    using std::chrono::duration_cast;
    using std::chrono::duration;
    using std::chrono::milliseconds;

    auto t1 = high_resolution_clock::now();
    long_operation();
    auto t2 = high_resolution_clock::now();

    /* Getting number of milliseconds as an integer. */
    auto ms_int = duration_cast<milliseconds>(t2 - t1);

    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_double = t2 - t1;

    std::cout << ms_int.count() << "ms\n";
    std::cout << ms_double.count() << "ms\n";
    return 0;
}

这将测量函数 long_operation 的执行时间。

150ms
150.068ms

工作示例：https://godbolt.org/z/oe5cMd