OpenMP自定义归约变量

3

我被指派实现一个不使用reduction子句的约简变量的想法。我设置了这个基本代码来测试它。

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
for (int i = 0; i < n; ++i)
{
    val += 1;
}
sum += val;

最终会有sum == n

每个线程都应该将val设置为私有变量,然后将其加入到sum中应该成为一个关键的部分,这是所有线程汇合的地方,例如:

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel for private(i, val) shared(n) num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
    val += 1;
}
#pragma omp critical
{
    sum += val;
}

我无法弄清如何维护关键部分的私有val实例。我尝试将整个内容包含在较大的pragma中,例如:

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val) shared(sum)
{
#pragma omp parallel for private(i) shared(n) num_threads(nthreads)
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
#pragma omp critical
    {
        sum += val;
    }
}

但我没有得到正确的答案。我应该如何设置指示符和子句来实现这一点?


我不确定这个语法是否正确。它是否应该为不同的线程创建多个“val”?我怀疑这一点,这意味着“val”被不同的线程同时访问和写入。 - stefan
3
我建议使用一个大小为nthreads的数组,并将其添加到array[omp_get_thread_num()],然后计算数组中所有值的总和。这样更加直观易懂;-) - stefan
是的,有更多实用的方法来做这件事,但这不是练习的重点。 - user41500
2个回答

6
您的程序存在很多缺陷。我们来看一下每个程序(缺陷以注释形式写出)。
程序一:
int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel for private(i, val) shared(n) num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
    val += 1;
}
// At end of this, all the openmp threads die. 
// The reason is the "pragma omp parallel" creates threads, 
// and the scope of those threads were till the end of that for loop. So, the thread dies
// So, there is only one thread (i.e. the main thread) that will enter the critical section
#pragma omp critical
{
    sum += val;
}

程序二
int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val) shared(sum)
 // pragma omp parallel creates the threads
{
#pragma omp parallel for private(i) shared(n) num_threads(nthreads)
  // There is no need to create another set of threads
  // Note that "pragma omp parallel" always creates threads.
  // Now you have created nested threads which is wrong
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
#pragma omp critical
    {
        sum += val;
    }
}

最好的解决方案应该是:
int n = 100000000;
double sum = 0.0;
int nThreads = 5;
#pragma omp parallel shared(sum, n) num_threads(nThreads) // Create omp threads, and always declare the shared and private variables here.
// Also declare the maximum number of threads.
// Do note that num_threads(nThreads) doesn't guarantees that the number of omp threads created is nThreads. It just says that maximum number of threads that can be created is nThreads... 
// num_threads actually limits the number of threads that can be created
{
    double val = 0.0;  // val can be declared as local variable (for each thread) 
#pragma omp for nowait       // now pragma for  (here you don't need to create threads, that's why no "omp parallel" )
    // nowait specifies that the threads don't need to wait (for other threads to complete) after for loop, the threads can go ahead and execute the critical section 
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
#pragma omp critical
    {
        sum += val;
    }
}

1
请注意,嵌套并行区域只有在显式启用嵌套并行性时才会产生新的线程组,而根据OpenMP规范,默认情况下是不启用嵌套并行性的。 - Hristo Iliev

2

在OpenMP中,您不需要显式指定共享变量,因为来自外部作用域的变量默认情况下始终是共享的(除非指定了default(none)子句)。由于private变量具有未定义的初始值,在累加循环之前应将私有副本清零。循环计数器会被自动识别并设为私有 - 不需要显式声明它们为私有。此外,由于您只是更新一个值,因此应使用atomic结构,因为它比完整的关键段更轻量级。

int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val)
{
    val = 0.0;
    #pragma omp for num_threads(nthreads)
    for (int i = 0; i < n; ++i)
    {
        val += 1;
    }
    #pragma omp atomic update
    sum += val;
}

在OpenMP 3.1中,update子句被添加到了atomic结构中。因此,如果您使用的编译器符合早期版本的OpenMP(例如,如果您使用的是仅支持OpenMP 2.0的MSVC++,即使在VS2012中也是如此),您将需要删除update子句。由于val变量在并行循环之外没有使用,因此可以在内部作用域中声明它,就像veda的答案一样,然后它会自动成为私有变量。
请注意,parallel for是嵌套两个OpenMP结构的快捷方式:parallelfor
#pragma omp parallel for sharing_clauses scheduling_clauses
for (...) {
}

等价于:

#pragma omp parallel sharing_clauses
#pragma omp for scheduling_clauses
for (...) {
}

这也适用于另外两个组合结构:并行节并行工作共享(仅限Fortran)。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接