什么是访问线程特定索引的最佳方式？

Question

什么是访问线程特定索引的最佳方式？

c++multithreadingoptimizationmemorythread-local-storage

3

我的程序中创建了n个线程。每个线程都从相同的“主线程函数”开始。代码会为该函数提供一个唯一的thread_index，以便产生线程：

void worker_main_func(int thread_index);

vector<thread> spawn_workers(int n) {
    vector<thread> workers;
    workers.reserve(n);
    for (int i = 0; i < n; ++i)
        workers.emplace_back(worker_main_func, i);
    }
    return workers;
}

每个工作线程都需要访问一个专用队列。它们都是预先分配的。为了访问它的队列，一个线程需要知道它的 thread_index：

static vector<my_queue_t> g_queues;

void do_some_work();

void worker_main_func(int thread_index) {
   do_some_work();
}

void do_some_work {
  // ...
  g_queues[get_this_thread_index_somehow()].some_operation_on_queue();
  // ...
}

我无法将thread_index直接传递给do_some_work，因为这需要几乎更改整个代码库。每个函数都需要带有一个额外的参数。对于当前通过寄存器传递其参数的函数，这可能会导致性能损失。使用新参数，它们可能需要在堆栈上传递其参数。

void do_some_work(int thread_index);
void calculate(int thread_index, /* params */);
void fetch_data(int thread_index, /* params */);
void implementation1(int thread_index, /* params */);
void blablabla(int thread_index, /* params */);

因此，我将thread_index存储在一个thread_local变量中，并每次读取它：

thread_local int g_thread_index;

void worker_main_func(int thread_index) {
   g_thread_index = thread_index;
   do_some_work();
}

void do_some_work {
  // ...
  g_queues[g_thread_index].some_operation_on_queue();
  // ...
}

虽然这种方法可行，但并不是最优的。这是因为编译器生成的代码每次使用 g_thread_index 时都会从内存（或缓存）中读取它，并有时在其周围放置额外的初始化保护。同时，所有线程执行的工作都包含在 worker_main_func 中，这意味着 worker_main_func 及其参数始终可用 - 在堆栈底部：

--- inner_most_call ----
  ...
  param2
  param1
--- fetch_data ---------
  param3
  param2
  param1
--- calculate1 ---------
--- do_some_work -------
  thread_index
--- worker_main_func ---

因此，编译器可以直接从相对于当前线程堆栈的固定偏移量读取thread_index，而不是从内存中读取。

我考虑使用std::this_thread::get_id()代替我的thread_index，但这会生成对pthread_self的调用，并需要一些映射才能从不透明的thread::id获取[0..n)索引。

- janekb04

使用std::async + lambda捕获您的线程索引。返回的std::future是从工作线程获取值和错误的好方法。 - Pepijn Kramer

@JesperJuhl 谢谢您的建议。然而，更新代码库不仅仅是因为变化规模较大而成为问题。我还担心给所有函数添加一个附加参数可能会导致其中一些函数变慢。例如，当前通过寄存器接受其参数的函数，如果有一个附加参数，则可能需要改用堆栈。 - janekb04

@PepijnKramer 我考虑使用lambda捕获，但我不知道它如何在这里有所帮助。调用堆栈深处的函数如何访问lambda对象？ - janekb04

@PepijnKramer 将数据传递到线程不是问题。问题在于在线程上的主要函数和其被调用者之间以及被调用者的被调用者等之间传递数据。 - janekb04

关于您提出的线程索引位于固定偏移量的想法，这需要每个函数都知道它在调用堆栈中的位置。否则，您需要在函数之间通信一个信息（如指针），那么您又回到了最初的问题。这意味着当读取该索引时，调用堆栈的最终状态是静态确定的。C++没有表达这种方式的方法。如果此解决方案适用于您，则最少需要内联汇编，并带有所有相关的复杂性。 - François Andrieux

显示剩余5条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Pepijn Kramer · Answer 1

以下是我使用std::async的方法。在线演示在这里：https://onlinegdb.com/Z2-syPOJW

#include <chrono>
#include <thread>
#include <future>
#include <vector>
#include <iostream>
#include <chrono>

// using namespace std; <== dont do this.

// ideally this should be in a class
thread_local int thread_index{ 0ul };

void some_existing_function()
{
    // sleep (not recommended) to cleanup output a bit
    std::this_thread::sleep_for(std::chrono::milliseconds(500));
    std::cout << "some_existing_function : " << thread_index << "\n";
    
}

int main()
{
    std::vector<std::future<int>> results(8ul); // we want 8 parallel actions
    int count{ 0ul };

    for (auto& result : results)
    {
        // start asynchronous functions and store their std::future objects.
        // this will spawn underlying threads for you (all in all std::async is just a nice abstraction)
        result = std::async(std::launch::async, [index = count++]
            {
                thread_index = index;
                some_existing_function();   
                return 40 + index;
            });
    }

    std::this_thread::sleep_for(std::chrono::milliseconds(3000));

    for (auto& result : results)
    {
        // result.get() will block until background thread has done the work.
        std::cout << result.get() << "\n";
    }

    return 0;
}