我目前正在尝试在caffe中实现自己的损失层,在此过程中,我使用其他层作为参考。然而,有一件事让我感到困惑,就是在Backward_cpu
中使用了top [0] -> cpu_diff()
。我将使用EuclideanLossLayer
作为参考。这是我的问题:
It is my understanding that
top[0]->cpu_diff()
holds the error derivative from the next layer, but what if there is no other layer, how is it initialised? since it is used inEuclideanLossLayer
without performing any checks:const Dtype alpha = sign * top[0]->cpu_diff()[0] / bottom[i]->num();
Again, in the
EuclideanLossLayer
, the derivative for the error with respect to the activations is calculated using the following code snippet:const Dtype alpha = sign * top[0]->cpu_diff()[0] / bottom[i]->num(); caffe_cpu_axpby( bottom[i]->count(), // count alpha, // alpha diff_.cpu_data(), // a Dtype(0), // beta bottom[i]->mutable_cpu_diff()); // b
If my first assumption is correct, and
top[0]->cpu_diff()
does indeed hold the error derivative for the layer above, why do we only use the first element i.e.top[0]->cpu_diff()[0]
as opposed to multiplying by the whole vector i.e.top[0]->cpu_diff()
?