修改Caffe C++预测代码以支持多个输入数据。

Question

修改Caffe C++预测代码以支持多个输入数据。

c++machine-learningneural-networkdeep-learningcaffe

12

我实现了一个修改版的Caffe C++示例，虽然它运行良好，但速度非常慢，因为它只接受一个图片。理想情况下，我希望将Caffe传递一个包含200张图像的向量，并返回每个图像的最佳预测结果。我得到了Fanglin Wang的大力帮助，并实现了他的一些建议，但仍然有一些问题需要解决，如何从每个图像中检索出最佳结果。

现在，Classify方法传递了一个cv::Mat对象的向量（变量input_channels），这是一组灰度浮点图像。我已经在代码中删除了预处理方法，因为我不需要将这些图像转换为浮点数或减去平均图像。我还试图摆脱N变量，因为我只想返回每个图像的前一个预测和概率。

#include "Classifier.h"
using namespace caffe;
using std::string;

Classifier::Classifier(const string& model_file, const string& trained_file, const string& label_file) {
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  Blob<float>* input_layer = net_->input_blobs()[0];
  num_channels_ = input_layer->channels();
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  CHECK(labels) << "Unable to open labels file " << label_file;
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  Blob<float>* output_layer = net_->output_blobs()[0];
  CHECK_EQ(labels_.size(), output_layer->channels())
    << "Number of labels is different from the output layer dimension.";
}

static bool PairCompare(const std::pair<float, int>& lhs, const std::pair<float, int>& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
  std::vector<std::pair<float, int> > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], i));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector<int> result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

/* Return the top N predictions. */
std::vector<Prediction> Classifier::Classify(const std::vector<cv::Mat> &input_channels) {
  std::vector<float> output = Predict(input_channels);

    std::vector<int> maxN = Argmax(output, 1);
    int idx = maxN[0];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
    return predictions;
}

std::vector<float> Classifier::Predict(const std::vector<cv::Mat> &input_channels, int num_images) {
  Blob<float>* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(num_images, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  WrapInputLayer(&input_channels);

  net_->ForwardPrefilled();

  /* Copy the output layer to a std::vector */
  Blob<float>* output_layer = net_->output_blobs()[0];
  const float* begin = output_layer->cpu_data();
  const float* end = begin + num_images * output_layer->channels();
  return std::vector<float>(begin, end);
}

/* Wrap the input layer of the network in separate cv::Mat objects (one per channel). This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D. The last preprocessing operation will write the separate channels directly to the input layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {
  Blob<float>* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels() * num_images; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

更新

非常感谢你的帮助，Shai。我按照你的建议进行了更改，但似乎出现了一些奇怪的编译问题，我无法解决（我已经解决了一些问题）。

这些是我所做的更改：

头文件：

#ifndef __CLASSIFIER_H__
#define __CLASSIFIER_H__

#include <caffe/caffe.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <algorithm>
#include <iosfwd>
#include <memory>
#include <string>
#include <utility>
#include <vector>


using namespace caffe;  // NOLINT(build/namespaces)
using std::string;

/* Pair (label, confidence) representing a prediction. */
typedef std::pair<string, float> Prediction;

class Classifier {
 public:
  Classifier(const string& model_file,
             const string& trained_file,
             const string& label_file);

  std::vector< std::pair<int,float> > Classify(const std::vector<cv::Mat>& img);

 private:

  std::vector< std::vector<float> > Predict(const std::vector<cv::Mat>& img, int nImages);

  void WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages);

  void Preprocess(const std::vector<cv::Mat>& img,
                  std::vector<cv::Mat>* input_channels, int nImages);

 private:
  shared_ptr<Net<float> > net_;
  cv::Size input_geometry_;
  int num_channels_;
  std::vector<string> labels_;
};

#endif /* __CLASSIFIER_H__ */

文件类：

#define CPU_ONLY
#include "Classifier.h"

using namespace caffe;  // NOLINT(build/namespaces)
using std::string;

Classifier::Classifier(const string& model_file,
                       const string& trained_file,
                       const string& label_file) {
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
  CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";

  Blob<float>* input_layer = net_->input_blobs()[0];
  num_channels_ = input_layer->channels();
  CHECK(num_channels_ == 3 || num_channels_ == 1)
    << "Input layer should have 1 or 3 channels.";
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  CHECK(labels) << "Unable to open labels file " << label_file;
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  Blob<float>* output_layer = net_->output_blobs()[0];
  CHECK_EQ(labels_.size(), output_layer->channels())
    << "Number of labels is different from the output layer dimension.";
}

static bool PairCompare(const std::pair<float, int>& lhs,
                        const std::pair<float, int>& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
  std::vector<std::pair<float, int> > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], i));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector<int> result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

std::vector< std::pair<int,float> > Classifier::Classify(const std::vector<cv::Mat>& img) {
  std::vector< std::vector<float> > output = Predict(img, img.size());

  std::vector< std::pair<int,float> > predictions;
  for ( int i = 0 ; i < output.size(); i++ ) {
    std::vector<int> maxN = Argmax(output[i], 1);
    int idx = maxN[0];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
  }
  return predictions;
}

std::vector< std::vector<float> > Classifier::Predict(const std::vector<cv::Mat>& img, int nImages) {
  Blob<float>* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(nImages, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  std::vector<cv::Mat> input_channels;
  WrapInputLayer(&input_channels, nImages);

  Preprocess(img, &input_channels, nImages);

  net_->ForwardPrefilled();

  /* Copy the output layer to a std::vector */

  Blob<float>* output_layer = net_->output_blobs()[0];
  std::vector <std::vector<float> > ret;
  for (int i = 0; i < nImages; i++) {
    const float* begin = output_layer->cpu_data() + i*output_layer->channels();
    const float* end = begin + output_layer->channels();
    ret.push_back( std::vector<float>(begin, end) );
  }
  return ret;
}

/* Wrap the input layer of the network in separate cv::Mat objects
 * (one per channel). This way we save one memcpy operation and we
 * don't need to rely on cudaMemcpy2D. The last preprocessing
 * operation will write the separate channels directly to the input
 * layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages) {
  Blob<float>* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels()* nImages; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

void Classifier::Preprocess(const std::vector<cv::Mat>& img,
                            std::vector<cv::Mat>* input_channels, int nImages) {
  for (int i = 0; i < nImages; i++) {
      vector<cv::Mat> channels;
      cv::split(img[i], channels);
      for (int j = 0; j < channels.size(); j++){
           channels[j].copyTo((*input_channels)[i*num_channels_[0]+j]);
      }
  }
}

- Jack Simpson

你能简要描述一下你的修改吗？谢谢。 - ypx

下面的答案（带有注释）是正确的。但是，在预处理步骤中，您需要（i）将图像格式转换为网络输入格式；（ii）如果给定的图像与输入_geometry_不同，则将其调整大小；和（iii）减去图像均值，您需要从文件imagenet_mean.binaryproto中加载。然后，您可以将图像分成独立的基于通道的图像平面。 - Josh

2个回答

4

很遗憾，我认为目前还没有实现网络正向传递的并行化。但是，如果您愿意，您可以简单地实现自己的包装器来重复运行数据通过网络的副本，以并行处理？请注意此处，这个链接中的prototxt中，您只需要定义如下所示的内容即可：

input_shape {
  dim: 64 // num of images
  dim: 1
  dim: 28 // height
  dim: 28 // width
}

现有的实现方式可以评估64张图片，但不一定是并行的。然而，如果在GPU上运行，处理一个批次的64张图片将比处理64个单独的图像批次更快。

- Aidan Gomez

谢谢你的帮助，艾登。那么我不能将blob等效物传递给向量，并一次性从网络中接收到预测向量吗？ - Jack Simpson

@JackSimpson 将图像数量指定为第一个 blob 维度与单个图像 blob 的向量相同。 - ypx

@JackSimpson：ypx是正确的，一个向量中的64个blob和一个num维度为64的blob是等效的。 - Aidan Gomez

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Shai · Accepted Answer

如果我正确理解您的问题，您输入了n张图像，期望得到n对(label, prob)，但只得到了一对。

我相信以下修改对您有帮助：

1. Classifier::Predict应该返回一个vector>，即每个输入图像的概率向量的向量。也就是大小为n的向量，其中每个元素都是大小为output_layer->channels()的向量：

std::vector< std::vecot<float> > 
Classifier::Predict(const std::vector<cv::Mat> &input_channels, 
                    int num_images) {
  // same code here...

  /* changes here: Copy the output layer to a std::vector */
  Blob<float>* output_layer = net_->output_blobs()[0];
  std::vector< std::vector<float> > ret;
  for ( int i = 0 ; i < num_images ; i++ ) {
      const float* begin = output_layer->cpu_data() + i*output_layer->channels();
      const float* end = begin + output_layer->channels();
      ret.push_back( std::vector<float>(begin, end) );
  }
  return ret;
}

在`Classifier::Classify`中，您需要独立地通过`Argmax`处理每个 `vector`：

 std::vector< std::pair<int,float> > 
 Classifier::Classify(const std::vector<cv::Mat> &input_channels) {

   std::vector< std::vector<float> > output = Predict(input_channels);

   std::vector< std::pair<int,float> > predictions;
   for ( int i = 0 ; i < output.size(); i++ ) {
       std::vector<int> maxN = Argmax(output[i], 1);
       int idx = maxN[0];
       predictions.push_back(std::make_pair(labels_[idx], output[idx]));
   }
   return predictions;
 }