在TensorFlow目标检测API教程中获取边界框坐标

29
我刚接触Python和Tensorflow. 我试图运行来自Tensorflow目标检测API的对象检测教程文件,但我找不到当对象被检测到时获取边界框坐标的位置。 相关代码:
 # The following processing is only for single image
 detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
 detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])

我假设边界框绘制的位置是这样的:

 # Visualization of the results of detection.
 vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks'),
      use_normalized_coordinates=True,
      line_thickness=8)
 plt.figure(figsize=IMAGE_SIZE)
 plt.imshow(image_np)

我试过打印 output_dict['detection_boxes'],但是我不确定这些数字代表什么意思。有很多。

array([[ 0.56213236,  0.2780568 ,  0.91445708,  0.69120586],
       [ 0.56261235,  0.86368728,  0.59286624,  0.8893863 ],
       [ 0.57073039,  0.87096912,  0.61292225,  0.90354401],
       [ 0.51422435,  0.78449738,  0.53994244,  0.79437423],
......

       [ 0.32784131,  0.5461576 ,  0.36972913,  0.56903434],
       [ 0.03005961,  0.02714229,  0.47211722,  0.44683522],
       [ 0.43143299, 0.09211366,  0.58121657,  0.3509962 ]], dtype=float32)

我找到了一些类似问题的答案,但我没有像他们那样叫做boxes的变量。如何获取坐标?

3个回答

27
我尝试打印output_dict ['detection_boxes'],但不确定数字的含义。
您可以自行查看代码。 visualize_boxes_and_labels_on_image_array在这里中定义。
请注意,您正在传递use_normalized_coordinates = True。如果您跟踪函数调用,您将看到您的数字[0.56213236,0.2780568,0.91445708,0.69120586]等是值[ymin,xmin,ymax,xmax],其中图像坐标:
(left, right, top, bottom) = (xmin * im_width, xmax * im_width, 
                              ymin * im_height, ymax * im_height)

由该函数计算:

def draw_bounding_box_on_image(image,
                           ymin,
                           xmin,
                           ymax,
                           xmax,
                           color='red',
                           thickness=4,
                           display_str_list=(),
                           use_normalized_coordinates=True):
  """Adds a bounding box to an image.
  Bounding box coordinates can be specified in either absolute (pixel) or
  normalized coordinates by setting the use_normalized_coordinates argument.
  Each string in display_str_list is displayed on a separate line above the
  bounding box in black text on a rectangle filled with the input 'color'.
  If the top of the bounding box extends to the edge of the image, the strings
  are displayed below the bounding box.
  Args:
    image: a PIL.Image object.
    ymin: ymin of bounding box.
    xmin: xmin of bounding box.
    ymax: ymax of bounding box.
    xmax: xmax of bounding box.
    color: color to draw bounding box. Default is red.
    thickness: line thickness. Default value is 4.
    display_str_list: list of strings to display in box
                      (each to be shown on its own line).
    use_normalized_coordinates: If True (default), treat coordinates
      ymin, xmin, ymax, xmax as relative to the image.  Otherwise treat
      coordinates as absolute.
  """
  draw = ImageDraw.Draw(image)
  im_width, im_height = image.size
  if use_normalized_coordinates:
    (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
                                  ymin * im_height, ymax * im_height)

2
好的。看起来output_dict['detection_boxes']包含了所有重叠的框,这就是为什么有那么多数组的原因。谢谢! - Mandy
1
什么决定了有多少重叠的框?还有为什么会有这么多重叠的框,为什么要将其传递到可视化层进行合并? - CMCDragonkai
我知道这是一个老问题,但我认为这可能会有所帮助。如果你增加visualize_boxes_and_labels_on_image_array函数的输入变量中的min_score_thresh值,你就可以限制重叠框的数量。默认情况下,它设置为0.5。例如,对于我的项目,我不得不将其增加到0.8。 - Web Nexus
1
规范化的边界框格式为 - ymin,xmin,ymax,xmax。https://github.com/tensorflow/models/blob/3db445c7b0404f9b98cbc47616bab08bfa3d8130/research/object_detection/utils/visualization_utils.py#L1235 - mrtpk

12

我有完全相同的故事。在一张图片上只显示一个框,但得到了大约一百个框的数组 (output_dict['detection_boxes'])。深入挖掘绘制矩形的代码后,我能够提取它并将其用于我的 inference.py 中:

#so detection has happened and you've got output_dict as a
# result of your inference

# then assume you've got this in your inference.py in order to draw rectangles
vis_util.visualize_boxes_and_labels_on_image_array(
    image_np,
    output_dict['detection_boxes'],
    output_dict['detection_classes'],
    output_dict['detection_scores'],
    category_index,
    instance_masks=output_dict.get('detection_masks'),
    use_normalized_coordinates=True,
    line_thickness=8)

# This is the way I'm getting my coordinates
boxes = output_dict['detection_boxes']
# get all boxes from an array
max_boxes_to_draw = boxes.shape[0]
# get scores to get a threshold
scores = output_dict['detection_scores']
# this is set as a default but feel free to adjust it to your needs
min_score_thresh=.5
# iterate over all objects found
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    # 
    if scores is None or scores[i] > min_score_thresh:
        # boxes[i] is the box which will be drawn
        class_name = category_index[output_dict['detection_classes'][i]]['name']
        print ("This box is gonna get used", boxes[i], output_dict['detection_classes'][i])

2

对我来说,上面的答案没有起作用,我必须进行一些更改。如果那不起作用,也许可以尝试这个。

# This is the way I'm getting my coordinates
boxes = detections['detection_boxes'].numpy()[0]
# get all boxes from an array
max_boxes_to_draw = boxes.shape[0]
# get scores to get a threshold
scores = detections['detection_scores'].numpy()[0]
# this is set as a default but feel free to adjust it to your needs
min_score_thresh=.5
# # iterate over all objects found
coordinates = []
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores[i] > min_score_thresh:
        class_id = int(detections['detection_classes'].numpy()[0][i] + 1)
        coordinates.append({
            "box": boxes[i],
            "class_name": category_index[class_id]["name"],
            "score": scores[i]
        })


print(coordinates)

在这里,坐标列表中的每个项目(字典)都是要在图像上绘制的框,具有框坐标(归一化)、类别名称和分数。

我遇到了以下错误:---> 32 boxes = detections['detection_boxes'].numpy()[0] AttributeError: 'numpy.ndarray' 对象没有 'numpy' 属性。 - Kirikkayis
@Kirikkayis 这意味着你的变量已经是一个NumPy数组。 - Shreyas Vedpathak

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接