向量求和在几乎相同的代码中表现不同

Question

向量求和在几乎相同的代码中表现不同

3

在使用g++标志-O3编译时，对数组和向量求和的这些函数似乎存在性能差异：

float sum1(float* v, int length) {
    float sum = 0;
    for(int i = 0; i < length; i++) {
        sum += v[i];
    }
    return sum;
}

float sum2(std::vector<float> v) {
    return sum1(&v[0], v.size());
}

在调用具有长度为100,000的向量的sum1和具有相同长度和内容的向量的sum2时，在我的测试中，sum2的运行速度比sum1慢约10%。测量的运行时间如下：

sum1: 0.279816 ms
sum2: 0.307811 ms

现在我们从哪里得到这个开销呢？如果我有任何错误，您可以在下面找到完整的测试代码。

[更新] 当使用引用调用（float sum2(std::vector& v)）时，大约会有3.7%的性能差异，这有所帮助，但仍然会有一些性能损失发生在其他地方？

[更新2] 其余部分似乎被统计支配，通过更多迭代可以看出。因此，真正的问题就是引用调用！

完整的测试代码（使用g++编译器以-O3标志进行编译，并使用clang++进行测试）：

#include <iostream>
#include <chrono>
#include <vector>

using namespace std;

std::vector<float> fill_vector(int length) {
    std::vector<float> ret;

    for(int i = 0; i < length; i++) {
        float r = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
        ret.push_back(r);
    }

    return ret;
}

float sum1(float* v, int length) {
    float sum = 0;
    for(int i = 0; i < length; i++) {
        sum += v[i];
    }
    return sum;
}

float sum2(std::vector<float> v) {
    return sum1(&v[0], v.size());
}

int main() {
    int iterations = 10000;
    int vector_size = 100000;

    srand(42);
    std::vector<float> v1 = fill_vector(vector_size);

    float* v2;
    v2 = &v1[0];

    std::chrono::duration<double, std::milli> duration_sum1(0);
    for(int i = 0; i < iterations; i++) {
        auto t1 = std::chrono::high_resolution_clock::now();
        float res = sum1(v2, vector_size);
        auto t2 = std::chrono::high_resolution_clock::now();
        cout << "Result sum1: " << res << endl;
        duration_sum1 += t2 - t1;
    }
    duration_sum1 /= iterations;

    std::chrono::duration<double, std::milli> duration_sum2(0);
    for(int i = 0; i < iterations; i++) {
        auto t1 = std::chrono::high_resolution_clock::now();
        float res = sum2(v1);
        auto t2 = std::chrono::high_resolution_clock::now();
        cout << "Result sum2: " << res << endl;
        duration_sum2 += t2 - t1;
    }
    duration_sum2 /= iterations;

    cout << "Durations:" << endl;
    cout << "sum1: " << duration_sum1.count() << " ms" << endl;
    cout << "sum2: " << duration_sum2.count() << " ms" << endl;
}

- 2xB

尝试更改测试的顺序...在调用sum1之前调用sum2。 - user2717954

5

尝试更改float sum2(std::vector<float>& v)。 - user9400869

4

@2xB float sum2(std::vector<float> v) -- C++有多种传递参数的方式。不幸的是，您选择了“按值传递”，而不是“按引用传递”。需要改为：@2xB float sum2(std::vector<float>& v) - PaulMcKenzie

3

按值传递会调用向量的深度复制。 - PaulMcKenzie

4

另外：建议使用 v.data() 而不是 &v[0]。同时，float sum3(std::vector<float> & v) { return std::accumulate(v.begin(), v.end(), 0.0f); }。 - Caleth

显示剩余3条评论

3个回答

1

你的函数 sum2() 接受一个 std::vector<float> 对象，通过值传递方式：

float sum2(std::vector<float> v) {
    return sum1(&v[0], v.size());
}

在这样的情况下：

std::vector<float> vec;
// ...
sum2(vec); // copies vec

参数对象v导致从传递给sum2()的参数vec进行复制初始化。如果向降低与调用sum2()相关的开销，则有以下选项：

- Making sum2() accept a reference to std::vector<float> instead, i.e., std::vector<float>&:
```
float sum2(std::vector<float>& v) {
   return sum1(&v[0], v.size());
}
```
  In this case just a reference to the vector is passed to the function, not the whole vector, so no copy of the vector is created.
- Calling sum2() in a way so that its parameter object v is move initialized from the passed argument (as opposed to copy initialized – what you are currently doing) if you don't need the contents of vec anymore after the call to sum2():
```
sum2(std::move(vec)); // move instead of copy
```

- JFMR

sum2(std::move(vec)) 返回0，因此使用它存在问题。 - 2xB

1

@2xB 因为你在循环中调用了 sum2(std::move(vec))。第一次迭代后，vec 就为空了。 - JFMR

@2xB 你的循环只是为了基准测试。如果你有一个向量，在计算其元素总和后就要销毁它，那么在将其作为参数传递给 sum2() 时，最好将其移动而不是复制。 - JFMR

对于这种情况，我认为移动是错误的。没有理由让 sum 消耗输入。它甚至不修改输入。 - user9400869

@generic_opto_guy 我只是试图阐述这个概念并解释正在发生的事情。如果 OP 真的想让 sum2() 通过值接受 std::vector<float>，并且不再需要该参数，那么这种方法将有助于加速调用。当然，没有理由消耗输入，但如果之后不需要向量，那么你并不真正关心它。 - JFMR

请注意，真正的问题在于让sum2()通过值接受向量。在调用sum2()时，程序员必须非常明确地指出移动操作，因为调用可能会导致窃取命名对象的内容，即必须使用std::move()。 - JFMR

1

为了补充已经建立的答案，使用引用可以避免昂贵的向量复制：

当使用引用时，您可以选择使用const引用。

您需要更改

float sum1(float* v, int length)

到

float sum1(const float* v, int length)

并且

float sum2(std::vector<float> v)

to

float sum2(const std::vector<float>& v)

使用引用意味着您不会复制向量，但这也允许sum2对向量进行更改。由于您使用引用的原因仅是为了避免复制，因此我认为在其接口中声明sum2不会更改向量是很好的。

对于sum1，const的相同逻辑也适用，并且变得不相关，因为const向量仅提供指向const的指针。

- user9400869

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Support Ukraine · Accepted Answer

5

我认为开销来自于传递向量。

尝试传递一个引用：

float sum2(std::vector<float>& v)

- Support Ukraine

哦，它将向量复制到那里。太好了，就这样，谢谢，还有@generic_opto_guy和@PaulMcKenzie！ - 2xB

如果我使用引用调用，性能差异只有3.7％，因此这里的主要问题确实是按值调用。不过，还有其他地方存在一些性能损失吗？ - 2xB

@2xB的sum2调用了sum1。也许是调用设置引起了问题。 - PaulMcKenzie

剩下的性能差异似乎纯粹是统计学上的支配。因此，按引用调用确实是这里的一个重要区别。 - 2xB

如果这个答案解决了你的问题（根据你的第一条评论似乎是这样），你可以考虑接受它 :) - Fareanor

@Fareanor 是的，我不确定是否接受这个回答或者眠りネロク的回答，因为他的回答包含更多信息，但是这个回答更直接并且比他的回答先发布。 - 2xB