我不明白为什么这样的代码在gcc 4.4.6中不能矢量化。
int MyFunc(const float *pfTab, float *pfResult, int iSize, int iIndex)
{
for (int i = 0; i < iSize; i++)
pfResult[i] = pfResult[i] + pfTab[iIndex];
}
note: not vectorized: unhandled data-ref
然而,如果我编写以下代码:
int MyFunc(const float *pfTab, float *pfResult, int iSize, int iIndex)
{
float fTab = pfTab[iIndex];
for (int i = 0; i < iSize; i++)
pfResult[i] = pfResult[i] + fTab;
}
如果我添加omp指令,gcc就会成功地自动向量化此循环。
int MyFunc(const float *pfTab, float *pfResult, int iSize, int iIndex)
{
float fTab = pfTab[iIndex];
#pragma omp parallel for
for (int i = 0; i < iSize; i++)
pfResult[i] = pfResult[i] + fTab;
}
我有以下错误没有向量化:未处理的数据引用。
请问为什么第一段和第三段代码不能自动向量化?
第二个问题: 数学运算似乎无法向量化(exp,log等...),例如此代码。
for (int i = 0; i < iSize; i++)
pfResult[i] = exp(pfResult[i]);
该代码未进行向量化处理。这是由于我的gcc版本造成的吗?
编辑: 使用新版本的gcc 4.8.1和openMP 2011(echo |cpp -fopenmp -dM |grep -i open),即使是基本循环,我也会遇到所有类型的循环的以下错误。
for (iGID = 0; iGID < iSize; iGID++)
{
pfResult[iGID] = fValue;
}
note: not consecutive access *_144 = 5.0e-1;
note: Failed to SLP the basic block.
note: not vectorized: failed to find SLP opportunities in basic block.
编辑2:
#include<stdio.h>
#include<sys/time.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
#include <omp.h>
int main()
{
int szGlobalWorkSize = 131072;
int iGID = 0;
int j = 0;
omp_set_dynamic(0);
// warmup
#if WARMUP
#pragma omp parallel
{
#pragma omp master
{
printf("%d threads\n", omp_get_num_threads());
}
}
#endif
printf("Pagesize=%d\n", getpagesize());
float *pfResult = (float *)malloc(szGlobalWorkSize * 100* sizeof(float));
float fValue = 0.5f;
struct timeval tim;
gettimeofday(&tim, NULL);
double tLaunch1=tim.tv_sec+(tim.tv_usec/1000000.0);
double time = omp_get_wtime();
int iChunk = getpagesize();
int iSize = ((int)szGlobalWorkSize * 100) / iChunk;
//#pragma omp parallel for
for (iGID = 0; iGID < iSize; iGID++)
{
pfResult[iGID] = fValue;
}
time = omp_get_wtime() - time;
gettimeofday(&tim, NULL);
double tLaunch2=tim.tv_sec+(tim.tv_usec/1000000.0);
printf("%.6lf Time1\n", tLaunch2-tLaunch1);
printf("%.6lf Time2\n", time);
}
结果使用
#define _OPENMP 201107
gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)
gcc -march=native -fopenmp -O3 -ftree-vectorizer-verbose=2 test.c -lm
许多(关于IT技术)
note: Failed to SLP the basic block.
note: not vectorized: failed to find SLP opportunities in basic block.
and note: not consecutive access *_144 = 5.0e-1;
谢谢
i
??? - Marc Glisse