在测试数据集上评估Tensorflow目标检测模型

Question

在测试数据集上评估Tensorflow目标检测模型

pythontensorflowobject-detectionobject-detection-api

4

我对可在模型库中找到的 faster_rcnn_resnet101 模型进行了微调，以检测我的自定义对象。我将数据分成了训练集和评估集，并在训练时在配置文件中使用它们。现在，在训练完成后，我想在未见过的数据上测试我的模型（我称之为测试数据）。我尝试了几个函数，但无法确定从tensorflow的API中使用哪些代码来评估测试数据集的性能。以下是我尝试的内容：

我使用了object_detection/metrics/offline_eval_map_corloc.py函数来获得测试数据集的评估。代码运行良好，但对于大和中等大小的边界框，我得到了AR和AP的负值。

平均精度（AP）@[IoU=0.50:0.95|area=all|maxDets=100]=0.459

平均精度（AP）@[IoU=0.50|area=all|maxDets=100]=0.601

平均精度（AP）@[IoU=0.75|area=all|maxDets=100]=0.543

平均精度（AP）@[IoU=0.50:0.95|area=small|maxDets=100]=0.459

平均精度（AP）@[IoU=0.50:0.95|area=medium|maxDets=100]=-1.000

平均精度（AP）@[IoU=0.50:0.95|area=large|maxDets=100]=-1.000

平均召回率（AR）@[IoU=0.50:0.95|area=all|maxDets=1]=0.543

平均召回率（AR）@[IoU=0.50:0.95|area=all|maxDets=10]=0.627

平均召回率（AR）@[IoU=0.50:0.95|area=all|maxDets=100]=0.628

平均召回率（AR）@[IoU=0.50:0.95|area=small|maxDets=100]=0.628

平均召回率（AR）@[IoU=0.50:0.95|area=medium|maxDets=100]=-1.000

平均召回率（AR）@[IoU=0.50:0.95|area=large|maxDets=100]=-1.000

现在，我知道mAP和AR不可能是负数，有些地方出了问题。我想知道为什么在测试数据集上运行离线评估时会看到负值？

我用来运行此流程的查询是：SPLIT=test

echo "
label_map_path: '/training_demo/annotations/label_map.pbtxt'
tf_record_input_reader: { input_path: '/training_demo/Predictions/test.record' }
" > /training_demo/${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt

echo "
metrics_set: 'coco_detection_metrics'
" > /training_demo/${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt 

python object_detection/metrics/offline_eval_map_corloc.py \
  --eval_dir='/training_demo/test_eval_metrics' \
  --eval_config_path='training_demo/test_eval_metrics/test_eval_config.pbtxt' \
  --input_config_path='/training_demo/test_eval_metrics/test_input_config.pbtxt'

我也尝试了 object_detection/legacy/eval.py，但我得到的评估指标值是负数:

DetectionBoxes_Recall/AR@100（medium）：-1.0 DetectionBoxes_Recall/AR@100（small）：-1.0 DetectionBoxes_Precision/mAP@.50IOU：-1.0 DetectionBoxes_Precision/mAP（medium）：-1.0 等等。

我使用了以下管道：

python eval.py \ --logtostderr \ --checkpoint_dir=trained-inference-graphs/output_inference_graph/ \ --eval_dir=test_eval_metrics \ --pipeline_config_path=training/faster_rcnn_resnet101_coco-Copy1.config

faster_rcnn_resnet101_coco-Copy1.config中的eval_input_reader指向具有真实信息和检测信息的测试TFRecord。

我还尝试了object_detection/utils/object_detection_evaluation来进行评估。这与使用第一种方法没有什么不同，因为它使用相同的基础函数-evaluator.evaluate()

我会非常感激任何帮助。

- Manish Rai

通过一些单元测试和调查，发现数据中使用了错误的类别映射（标签映射）。例如，如果标签映射中不包含类别4，但由于数据错误，实际上在真实数据中存在类别4，则指标值将为-1.0。 - Manish Rai

3个回答

0

!python eval.py --logtostderr --pipeline_config_path=--checkpoint_dir--eval_dir=eval/

您可以在遗留文件夹中找到Eval.py

- jansary

0

对于我来说，我只运行了 model_main.py 一次，并在 pipeline.config 中将 eval_input_reader 更改为测试数据集。但我不确定是否应该这样做。

python model_main.py \
    --alsologtostderr \
    --run_once \
    --checkpoint_dir=$path_to_model \
    --model_dir=$path_to_eval \
    --pipeline_config_path=$path_to_config

pipeline.config

eval_config: {
  metrics_set: "coco_detection_metrics"
  num_examples: 721 # no of test images
  num_visualizations: 10 # no of visualizations for tensorboard
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/path/to/test-data.record"
  }
  label_map_path: "/path/to/label_map.pbtxt"
  shuffle: true
  num_readers: 1
}

对于我来说，验证集和测试集之间的mAP没有区别。所以我不确定是否实际上需要将数据分割为训练、验证和测试数据。

- rayon

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- danyfang · Accepted Answer

评估指标采用COCO格式，您可以参考COCO API了解这些值的含义。

如COCO API code所述，如果类别不存在，则默认值为-1。在您的情况下，所有检测到的对象都属于“小”区域。此外，“小”，“中”和“大”区域的面积类别取决于像素占用的面积，如here所述。