我正在尝试使用OpenMP进行编程。看看这段代码片段:
#pragma omp parallel
{
for( i =0;i<n;i++)
{
doing something
}
}
并且
for( i =0;i<n;i++)
{
#pragma omp parallel
{
doing something
}
}
为什么第一个比第二个慢得多(大约是5倍)?从理论上讲,我认为第一个必须更快,因为并行区域只创建一次,而不像第二个那样创建n次?
有人能为我解释一下吗?
我想并行化的代码具有以下结构:
for(i=0;i<n;i++) //wont be parallelizable
{
for(j=i+1;j<n;j++) //will be parallelized
{
doing sth.
}
for(j=i+1;j<n;j++) //will be parallelized
for(k = i+1;k<n;k++)
{
doing sth.
}
}
我写了一个简单的程序来测量时间并重现我的结果。
#include <stdio.h>
#include <omp.h>
void test( int n)
{
int i ;
double t_a = 0.0, t_b = 0.0 ;
t_a = omp_get_wtime() ;
#pragma omp parallel
{
for(i=0;i<n;i++)
{
}
}
t_b = omp_get_wtime() ;
for(i=0;i<n;i++)
{
#pragma omp parallel
{
}
}
printf( "directive outside for-loop: %lf\n", 1000*(omp_get_wtime()-t_a)) ;
printf( "directive inside for-loop: %lf \n", 1000*(omp_get_wtime()-t_b)) ;
}
int main(void)
{
int i, n ;
double t_1 = 0.0, t_2 = 0.0 ;
printf( "n: " ) ;
scanf( "%d", &n ) ;
t_1 = omp_get_wtime() ;
#pragma omp parallel
{
for(i=0;i<n;i++)
{
}
}
t_2 = omp_get_wtime() ;
for(i=0;i<n;i++)
{
#pragma omp parallel
{
}
}
printf( "directive outside for-loop: %lf\n", 1000*(omp_get_wtime()-t_1)) ;
printf( "directive inside for-loop: %lf \n", 1000*(omp_get_wtime()-t_2)) ;
test(n) ;
return 0 ;
}
如果我用不同的n开始,我总是会得到不同的结果。
n: 30000
directive outside for-loop: 0.881884
directive inside for-loop: 0.073054
directive outside for-loop: 0.049098
directive inside for-loop: 0.011663
n: 30000
directive outside for-loop: 0.402774
directive inside for-loop: 0.071588
directive outside for-loop: 0.049168
directive inside for-loop: 0.012013
n: 30000
directive outside for-loop: 2.198740
directive inside for-loop: 0.065301
directive outside for-loop: 0.047911
directive inside for-loop: 0.012152
n: 1000
directive outside for-loop: 0.355841
directive inside for-loop: 0.079480
directive outside for-loop: 0.013549
directive inside for-loop: 0.012362
n: 10000
directive outside for-loop: 0.926234
directive inside for-loop: 0.071098
directive outside for-loop: 0.023536
directive inside for-loop: 0.012222
n: 10000
directive outside for-loop: 0.354025
directive inside for-loop: 0.073542
directive outside for-loop: 0.023607
directive inside for-loop: 0.012292
你能帮我解释这个差别吗?!
使用你的版本得到的结果:
Input n: 1000
[2] directive outside for-loop: 0.331396
[2] directive inside for-loop: 0.002864
[2] directive outside for-loop: 0.011663
[2] directive inside for-loop: 0.001188
[1] directive outside for-loop: 0.021092
[1] directive inside for-loop: 0.001327
[1] directive outside for-loop: 0.005238
[1] directive inside for-loop: 0.001048
[0] directive outside for-loop: 0.020812
[0] directive inside for-loop: 0.001188
[0] directive outside for-loop: 0.005029
[0] directive inside for-loop: 0.001257
{}
,就可以使用#pragma omp parallel for
(注意 pragma 中的for
)了吧? - Jens Gustedt1000*(t_2-t_1)
而不是1000*(omp_get_wtime()-t_1)
。 - osgx