使用OpenMP并行化内循环

Question

使用OpenMP并行化内循环

3

假设我们有两个嵌套的循环。内部循环应该并行化，但外部循环需要按顺序执行。那么下面的代码就可以实现我们想要的效果：

for (int i = 0; i < N; ++i) {
  #pragma omp parallel for schedule(static)
  for (int j = first(i); j < last(i); ++j) {
    // Do some work
  }
}

现在假设每个线程都需要获取一些线程本地对象来执行内部循环中的工作，并且获取这些线程本地对象是昂贵的。因此，我们不希望进行以下操作：

for (int i = 0; i < N; ++i) {
  #pragma omp parallel for schedule(static)
  for (int j = first(i); j < last(i); ++j) {
    ThreadLocalObject &obj = GetTLO(omp_get_thread_num()); // Costly!
    // Do some work with the help of obj
  }
}

我该如何解决这个问题？

每个线程应该只请求其本地对象一次。
内部循环应该在所有线程之间并行化。
外部循环的迭代应该一个接一个执行。

我的想法是以下这样，但它真的符合我的要求吗？

#pragma omp parallel
{
  ThreadLocalObject &obj = GetTLS(omp_get_thread_num());
  for (int i = 0; i < N; ++i) {
    #pragma omp for schedule(static)
    for (int j = first(i); j < last(i); ++j) {
      // Do some work with the help of obj
    }
  }
}

- user1494080

@HighPerformanceMark，除了错别字之外，我认为OP的问题很有趣，这在最近的OpenMP问题中很少见。您对我使用threadprivate解决方案有什么评论吗？我犯了错误吗（我现在几乎只使用C，我的C++非常生疏）？ - Z boson

你的方法很好。您能告诉我为什么您需要一个线程本地对象初始化为线程号吗？ - Z boson

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Massimiliano · Accepted Answer

我并不真正理解为什么需要复杂化threadprivate，当你可以简单地使用一个对象池。基本的想法应该沿着这些方向：

#pragma omp parallel
{      
  // Will hold an handle to the object pool
  auto pool = shared_ptr<ObjectPool>(nullptr); 
  #pragma omp single copyprivate(pool)
  {
    // A single thread creates a pool of num_threads objects
    // Copyprivate broadcasts the handle
    pool = create_object_pool(omp_get_num_threads());
  }
  for (int i = 0; i < N; ++i) 
  {
    #pragma omp parallel for schedule(static)
    for (int j = first(i); j < last(i); ++j) 
    {
        // The object is not re-created, just a reference to it
        // is returned from the pool
        auto & r = pool.get( omp_get_thread_num() );
        // Do work with r
    }
  }
}