我有许多单精度向量三元组 (x1,y1,z1),(x2,y2,z2),(x3,y3,z3),我想对它们进行重新排序,使其变为 (x1,x2,x3,0,y1,y2,y3,0,z1,z2,z3,0)。
目标是为基于 SSE 的计算准备数据集。我有以下代码可以实现此操作:
for (int i=0;i<count;i++)
{
Vect3F p0 = get_first_point(i);
Vect3F p1 = get_second_point(i);
Vect3F p2 = get_third_point(i);
int idx = i*3;
scratch[idx] = Vec4F(p0.x, p1.x, p2.x, 0); // These 3 rows are the slowest
scratch[idx+1] = Vec4F(p0.y, p1.y, p2.y, 0);
scratch[idx+2] = Vec4F(p0.z, p1.z, p2.z, 0);
}
循环的最后3行非常慢,它们占据了整个算法90%的时间!这正常吗?我能让这种混洗更快吗?(scratch是一个静态变量,并且是16字节对齐的。该函数被频繁调用,因此我认为scratch块不应从缓存中消失。)
float* a = (float*)(cache + i*3); a[0] = p0.x; a[1] = p1.x; a[2] = p2.x; a[4] = p0.y; a[5] = p1.y; a[6] = p2.y; a[8] = p0.z; a[9] = p1.z; a[10] = p2.z;
这有一些帮助,但仍然非常缓慢。 - antonfrvint tri = triangles[i];Vect3F p0 = points[indices[tri]]; Vect3F p1 = points[indices[tri+1]];Vect3F p2 = points[indices[tri+2]];
- antonfrv