Caffe Python层中的反向传播未被调用/工作？

Question

Caffe Python层中的反向传播未被调用/工作？

3

我正在尝试使用Caffe在Python中实现简单的损失层，但尝试不成功。我找到了几个已经用Python实现的参考层，包括这里、这里和这里。从Caffe文档/示例提供的EuclideanLossLayer开始，我无法使其工作并进行了调试。即使是使用这个简单的TestLayer：

def setup(self, bottom, top):
    """
    Checks the correct number of bottom inputs.
    
    :param bottom: bottom inputs
    :type bottom: [numpy.ndarray]
    :param top: top outputs
    :type top: [numpy.ndarray]
    """
    
    print 'setup'

def reshape(self, bottom, top):
    """
    Make sure all involved blobs have the right dimension.
    
    :param bottom: bottom inputs
    :type bottom: caffe._caffe.RawBlobVec
    :param top: top outputs
    :type top: caffe._caffe.RawBlobVec
    """
    
    print 'reshape'
    top[0].reshape(bottom[0].data.shape[0], bottom[0].data.shape[1], bottom[0].data.shape[2], bottom[0].data.shape[3])
    
def forward(self, bottom, top):
    """
    Forward propagation.
    
    :param bottom: bottom inputs
    :type bottom: caffe._caffe.RawBlobVec
    :param top: top outputs
    :type top: caffe._caffe.RawBlobVec
    """
    
    print 'forward'
    top[0].data[...] = bottom[0].data

def backward(self, top, propagate_down, bottom):
    """
    Backward pass.
    
    :param bottom: bottom inputs
    :type bottom: caffe._caffe.RawBlobVec
    :param propagate_down:
    :type propagate_down:
    :param top: top outputs
    :type top: caffe._caffe.RawBlobVec
    """
    
    print 'backward'
    bottom[0].diff[...] = top[0].diff[...]

我无法让Python层正常工作。学习任务相对简单，我只是试图预测一个实数是正数还是负数。相应的数据如下生成并写入LMDBs：

N = 10000
N_train = int(0.8*N)
    
images = []
labels = []
    
for n in range(N):            
    image = (numpy.random.rand(1, 1, 1)*2 - 1).astype(numpy.float)
    label = int(numpy.sign(image))
        
    images.append(image)
    labels.append(label)

将数据写入LMDB应该是正确的，因为使用Caffe提供的MNIST数据集进行测试时没有出现问题。网络结构定义如下：

 net.data, net.labels = caffe.layers.Data(batch_size = batch_size, backend = caffe.params.Data.LMDB, 
                                                source = lmdb_path, ntop = 2)
 net.fc1 = caffe.layers.Python(net.data, python_param = dict(module = 'tools.layers', layer = 'TestLayer'))
 net.score = caffe.layers.TanH(net.fc1)
 net.loss = caffe.layers.EuclideanLoss(net.score, net.labels)

手动解决问题需要使用以下步骤：

for iteration in range(iterations):
    solver.step(step)

相应的prototxt文件如下：：

weight_decay: 0.0005
test_net: "tests/test.prototxt"
snapshot_prefix: "tests/snapshot_"
max_iter: 1000
stepsize: 1000
base_lr: 0.01
snapshot: 0
gamma: 0.01
solver_mode: CPU
train_net: "tests/train.prototxt"
test_iter: 0
test_initialization: false
lr_policy: "step"
momentum: 0.9
display: 100
test_interval: 100000

train.prototxt:

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "labels"
  data_param {
    source: "tests/train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "fc1"
  type: "Python"
  bottom: "data"
  top: "fc1"
  python_param {
    module: "tools.layers"
    layer: "TestLayer"
  }
}
layer {
  name: "score"
  type: "TanH"
  bottom: "fc1"
  top: "score"
}
layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "score"
  bottom: "labels"
  top: "loss"
}

test.prototxt:

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "labels"
  data_param {
    source: "tests/test_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "fc1"
  type: "Python"
  bottom: "data"
  top: "fc1"
  python_param {
    module: "tools.layers"
    layer: "TestLayer"
  }
}
layer {
  name: "score"
  type: "TanH"
  bottom: "fc1"
  top: "score"
}
layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "score"
  bottom: "labels"
  top: "loss"
}

我试图追踪它，向TestLayer的backward和foward方法中添加调试信息，但只有在解决过程中才会调用forward方法(请注意，这里没有执行任何测试，这些调用只能与解决相关)。同样地，在python_layer.hpp中添加了调试信息:

virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  LOG(INFO) << "cpp forward";
  self_.attr("forward")(bottom, top);
}
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
  LOG(INFO) << "cpp backward";
  self_.attr("backward")(top, propagate_down, bottom);
}

再次强调，只有前向传递被执行。当我删除TestLayer中的backward方法时，求解仍然有效。但是如果删除forward方法，会出现错误，提示forward未实现。我本应该期望对于backward也是同样的情况，因此似乎根本没有执行反向传递。切换回常规层并添加调试消息，一切都按预期工作。

我觉得可能是我缺少了一些简单或基本的知识，但是我几天来一直无法解决这个问题。所以任何帮助或提示都将不胜感激。

谢谢！

- David Stutz

3个回答

2

除了 Erik B. 的回答外，您还可以通过指定参数来强制让caffe进行反向传播。

force_backward: true

在你的net prototxt文件中。
有关更多信息，请参见caffe.proto中的注释。

- Shai

1

我的代码没有生效，即使我按照David Stutz的建议设置了force_backward: true。我在这里和这里发现，我忘记在目标类别的索引处将最后一层的差异设置为1。

正如Mohit Jain在他的caffe-users回答中所描述的那样，如果你正在使用tabby cat进行ImageNet分类，在进行前向传递之后，你需要执行以下操作：

net.blobs['prob'].diff[0][281] = 1   # 281 is tabby cat. diff shape: (1, 1000)

请注意，您需要根据您最后一层的名称相应更改'prob'，通常为softmax和'prob'。

以下是基于我的示例：

deploy.prototxt（它基于VGG16，只是为了展示文件的结构，但我没有测试过）：

name: "smaller_vgg"
input: "data"
force_backward: true
input_dim: 1
input_dim: 3
input_dim: 224
input_dim: 224
layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu1_1"
  type: "ReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1_1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "pool1"
  top: "fc1"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "fc1"
  top: "fc1"
}
layer {
  name: "drop1"
  type: "Dropout"
  bottom: "fc1"
  top: "fc1"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc2"
  type: "InnerProduct"
  bottom: "fc1"
  top: "fc2"
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc2"
  top: "prob"
}

main.py:

import caffe

prototxt = 'deploy.prototxt'
model_file = 'smaller_vgg.caffemodel'
net = caffe.Net(model_file, prototxt, caffe.TRAIN)  # not sure if TEST works as well

image = cv2.imread('tabbycat.jpg', cv2.IMREAD_UNCHANGED)

net.blobs['data'].data[...] = image[np.newaxis, np.newaxis, :]
net.blobs['prob'].diff[0, 298] = 1
net.forward()
backout = net.backward()

# access grad from backout['data'] or net.blobs['data'].diff

- Yamaneko

根据你的代码，为什么在执行net.backward()之后，net.blobs['prob'].diff[0, 298]不再是1了呢？难道net.backward()会改变你预设的值吗？ - Stone

@Stone 我不确定。我仅使用此代码进行一次引导反向传播和Grad-CAM。也许Caffe在每次迭代后重置了 diff（如果是这样，那么所有梯度也都被重置了）。在每个 backward() 调用之后设置 net.blobs['prob'].diff[0, 298] = 1 是否可以解决它？ - Yamaneko

在每次backward()调用后设置net.blobs['prob'].diff[0, 298] = 1，由其本质保证了它的值仍然为1。我的担忧是，如果Caffe在每次迭代后重置'diff'（如您所说），那么在net.backward()之后就无法从net.blobs[layer_name].diff访问grad。此外，如果在net.backward()之后访问net.blobs[layer_name].diff是正确的方法，那么最顶层的概率层prob（net.blobs['prob'].diff）的梯度应该保持不变（例如net.blobs['prob'].diff[0, 298] = 1），因为梯度计算从prob层开始。 - Stone

@Stone，你说得对，访问net.blobs[layer].diff是在backward()之后。然后，我不明白为什么它会重置prob的差分机制。如果你发现了什么，请告诉我。 - Yamaneko

也许这只是我的问题，如果你没有看到Caffe在你的端口上重置了backward()后的diff，那么我可能错过了一些配置。谢谢！ - Stone

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Erik B. · Accepted Answer

这是预期的行为，因为您的python层下面没有需要计算梯度以更新权重的层。Caffe会注意到这一点，并跳过这些层的反向计算，因为这将是浪费时间。

Caffe在网络初始化时记录所有层是否需要反向计算。在您的情况下，您应该看到类似于以下内容的输出：

fc1 does not need backward computation.

如果您在“Python”层下方放置了“InnerProduct”或“Convolution”层（例如，Data->InnerProduct->Python->Loss），则需要进行反向计算并调用您的反向方法。