为什么与简单的分离线程相比，std::async速度较慢？

Question

为什么与简单的分离线程相比，std::async速度较慢？

c++multithreadingc++11asynchronousstdasync

20

我听过多次，建议在需要遗忘的任务中使用带有std::launch::async参数的std::async(这样它将在新的执行线程上执行)。

受这些说法的鼓舞，我想看看std::async与以下内容的比较：

- 顺序执行 - 简单的分离std :: thread - 我的简单异步“实现”

我的天真的异步实现如下：

template <typename F, typename... Args>
auto myAsync(F&& f, Args&&... args) -> std::future<decltype(f(args...))>
{
    std::packaged_task<decltype(f(args...))()> task(std::bind(std::forward<F>(f), std::forward<Args>(args)...));
    auto future = task.get_future();

    std::thread thread(std::move(task));
    thread.detach();

    return future;
}

这里没有什么花里胡哨的东西，它将函数对象f和它的参数打包到一个std::packaged_task中，并在一个新的std::thread上启动它，然后分离线程，并返回任务的std::future。

下面是使用std::chrono::high_resolution_clock测量执行时间的代码：

int main(void)
{
    constexpr unsigned short TIMES = 1000;

    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < TIMES; ++i)
    {
        someTask();
    }
    auto dur = std::chrono::high_resolution_clock::now() - start;

    auto tstart = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < TIMES; ++i)
    {
        std::thread t(someTask);
        t.detach();
    }
    auto tdur = std::chrono::high_resolution_clock::now() - tstart;

    std::future<void> f;
    auto astart = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < TIMES; ++i)
    {
        f = std::async(std::launch::async, someTask);
    }
    auto adur = std::chrono::high_resolution_clock::now() - astart;

    auto mastart = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < TIMES; ++i)
    {
        f = myAsync(someTask);
    }
    auto madur = std::chrono::high_resolution_clock::now() - mastart;

    std::cout << "Simple: " << std::chrono::duration_cast<std::chrono::microseconds>(dur).count() <<
    std::endl << "Threaded: " << std::chrono::duration_cast<std::chrono::microseconds>(tdur).count() <<
    std::endl << "std::sync: " << std::chrono::duration_cast<std::chrono::microseconds>(adur).count() <<
    std::endl << "My async: " << std::chrono::duration_cast<std::chrono::microseconds>(madur).count() << std::endl;

    return EXIT_SUCCESS;
}

在这里，someTask()是一个简单的方法，我会稍微等待一下，模拟一些完成的工作：

void someTask()
{
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
}

最终结果：

顺序执行：1263615
线程执行：47111
std::sync：821441
我的异步：30784

有人能够解释这些结果吗？似乎std::async比我天真的实现或者直接分离的std::thread要慢得多。为什么？在这些结果之后，还有使用std::async的理由吗？

（请注意，我还使用了clang++和g++来进行此基准测试，并且结果非常相似）

更新：

在阅读Dave S的回答后，我对我的小型基准测试进行了以下更新：

std::future<void> f[TIMES];
auto astart = std::chrono::high_resolution_clock::now();
for (int i = 0; i < TIMES; ++i)
{
    f[i] = std::async(std::launch::async, someTask);
}
auto adur = std::chrono::high_resolution_clock::now() - astart;

因此，现在不会在每次运行时销毁std::future，因此可以加入。在代码更改后，std::async产生的结果类似于我的实现和分离的std::thread。请保留HTML标签。

- krispet krispet

4

我相信这不是问题，但为了信息的完整性，我还是要问一下。您是在测量调试（未优化）版本还是发布（经过优化）版本？我假设您正在测试优化版本，否则任何测量都将毫无意义，但我仍需确认。 - Jesper Juhl

@JesperJuhl 完全有效的问题，但我是使用 -O2 进行测量的。 - krispet krispet

2个回答

8

sts::async会返回一个特殊的std::future。这个future有一个~future，它执行.wait()。

因此，你的示例基本上是不同的。慢的示例实际上在计时期间执行任务。快速的示例只是排队任务，并忘记如何知道任务是否完成。由于让线程持续到主函数结束后的程序行为是不可预测的，所以应该避免这种情况。

比较任务的正确方法是在生成时存储结果future，并在计时器结束之前.wait()/.join()所有结果，或者在计时器过期之后再销毁对象。然而，这种情况会使顺序版本看起来比实际更糟糕。

在开始下一个测试之前确实需要加入/等待，否则会从计时中窃取资源。

请注意，移动的futures会从源中删除等待。

- Yakk - Adam Nevraumont

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dave S · Accepted Answer

19

一个关键的区别是，异步返回的future在销毁时或被替换为新值时会加入线程（即等待任务执行完毕）。

这意味着它必须执行 someTask()并加入线程，这两个操作都需要时间。而其他测试则没有这样做，它们只是独立地启动了任务。

- Dave S

7

“the future returned by async joins the thread when the future is destroyed”意思是：使用std::async的主要原因是当未来(future)被销毁时，异步(async)返回的未来(future)会加入(join)线程(thread)，这也是永远不要使用std::async的主要原因。 - Nicol Bolas

3

如果你想要那样做，你应该显式调用future::wait，就像你对其他每个future对象所做的一样。问题在于由async返回的future对象与每个其他future对象的行为不同。问题是不一致性，而不是行为本身。这种不一致且无法重现的行为使得很容易无意中做错事情，就像这里发生的一样。 - Nicol Bolas

谢谢，这个解决了谜团。但是，这种行为背后的原因是什么呢？您能否指出一些关于这个问题的文章或类似的东西吗？ - krispet krispet

@krispetkrispet 这部分已经在这个答案中得到了回答 (免责声明：我写的)。Scott Meyer 也写了一篇文章。不管怎么说，未来的这部分在2012/2013年进行了广泛讨论，我不确定最终的结论是什么。 - Zeta