Tensorflow目标检测API推理时间过慢

3
我一直在使用Tensorflow目标检测API - 在我的情况下,我试图使用模型动态库中的kitti-trained模型(faster_rcnn_resnet101_kitti_2018_01_28)来检测静止图像中的车辆,并且我正在使用从github仓库中的object_detection_tutorial jupyter笔记本修改的代码。
我已经包含了我的修改代码,但是在github上的原始笔记本中发现了相同的结果。
当在Amazon AWS g3x4large(GPU)实例上的jupyter笔记本服务器上运行时,在深度学习AMI上处理单个图像需要不到4秒钟。 推理函数的时间为1.3-1.5秒(请参见下面的代码),这似乎对于模型报告的推理时间(20ms)来说过高。 尽管我不希望达到报告的标准,但我的时间似乎不合适,对于我的需求来说也不切实际。 我正在考虑一次处理100万张以上的图像,而且不能承受46天的处理时间。 鉴于该模型用于视频帧捕获....我认为应该可以将每个图像的时间缩短至少1秒。
我的问题是:
1)有哪些解释/解决方案可以减少推理时间?
2)将图像转换为numpy(在处理之前)需要1.5秒,是否过高?
3)如果这是我可以期望的最佳性能,那么从重新设计批处理图像的模型中可以获得多少时间增加?
感谢您的帮助!
Python笔记本代码:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import json
import collections
import os.path
import datetime

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")

# This is needed to display the images.
get_ipython().magic('matplotlib inline')

#Setup variables
PATH_TO_TEST_IMAGES_DIR = 'test_images'

MODEL_NAME = 'faster_rcnn_resnet101_kitti_2018_01_28'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'kitti_label_map.pbtxt')

NUM_CLASSES = 2

from utils import label_map_util
from utils import visualization_utils as vis_util

def get_scores(
    boxes,
    classes,
    scores,
    category_index,
    min_score_thresh=.5
):

  import collections
  # Create a display string (and color) for every box location, group any boxes
  # that correspond to the same location.
  box_to_display_str_map = collections.defaultdict(list)

  for i in range(boxes.shape[0]):
    if scores is None or scores[i] > min_score_thresh:
      box = tuple(boxes[i].tolist())
      if scores is None:
        box_to_color_map[box] = groundtruth_box_visualization_color
      else:
        display_str = ''
        if classes[i] in category_index.keys():
          class_name = category_index[classes[i]]['name']
        else:
          class_name = 'N/A'
        display_str = str(class_name)
        if not display_str:
          display_str = '{}%'.format(int(100*scores[i]))
        else:
          display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
        box_to_display_str_map[i].append(display_str)

  return box_to_display_str_map

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

#get list of paths
exten='.jpg'
TEST_IMAGE_PATHS=[]

for dirpath, dirnames, files in os.walk(PATH_TO_TEST_IMAGES_DIR):
    for name in files:
        if name.lower().endswith(exten):
            #print(os.path.join(dirpath,name))
            TEST_IMAGE_PATHS.append(os.path.join(dirpath,name))
print((len(TEST_IMAGE_PATHS), 'Images To Process'))

#load model graph for inference
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')

#setup class labeling parameters    
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

#placeholder for timings
myTimings=[]

myX = 1
myResults = collections.defaultdict(list)
for image_path in TEST_IMAGE_PATHS:
  if os.path.exists(image_path):  
    print(myX,"--------------------------------------",datetime.datetime.time(datetime.datetime.now()))
    print(myX,"Image:", image_path)
    myTimings.append((myX,"Image", image_path))
    print(myX,"Open:",datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Open",datetime.datetime.time(datetime.datetime.now()).__str__()))
    image = Image.open(image_path)
    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    print(myX,"Numpy:",datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Numpy",datetime.datetime.time(datetime.datetime.now()).__str__()))
    image_np = load_image_into_numpy_array(image)
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    print(myX,"Expand:",datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Expand",datetime.datetime.time(datetime.datetime.now()).__str__()))
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.
    print(myX,"Detect:",datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Detect",datetime.datetime.time(datetime.datetime.now()).__str__()))
    output_dict = run_inference_for_single_image(image_np, detection_graph)
    # Visualization of the results of a detection.
    print(myX,"Export:",datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Export",datetime.datetime.time(datetime.datetime.now()).__str__()))
    op=get_scores(
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      min_score_thresh=.2)
    myResults[image_path].append(op)  
    print(myX,"Done:", datetime.datetime.time(datetime.datetime.now()))
    myTimings.append((myX,"Done", datetime.datetime.time(datetime.datetime.now()).__str__()))
    myX= myX + 1

#save results    
with open((OUTPUTS_BASENAME+'_Results.json'), 'w') as fout:
    json.dump(myResults, fout)
with open((OUTPUTS_BASENAME+'_Timings.json'), 'w') as fout:
    json.dump(myTimings, fout)

时间示例:

[1, "Image", "test_images/DE4T_11Jan2018/MFDC4612.JPG"]
[1, "Open", "19:20:08.029423"]
[1, "Numpy", "19:20:08.052679"]
[1, "Expand", "19:20:09.977166"]
[1, "Detect", "19:20:09.977250"]
[1, "Export", "19:23:13.902443"]
[1, "Done", "19:23:13.903012"]
[2, "Image", "test_images/DE4T_11Jan2018/MFDC4616.JPG"]
[2, "Open", "19:23:13.903885"]
[2, "Numpy", "19:23:13.906320"]
[2, "Expand", "19:23:15.756308"]
[2, "Detect", "19:23:15.756597"]
[2, "Export", "19:23:17.153233"]
[2, "Done", "19:23:17.153699"]
[3, "Image", "test_images/DE4T_11Jan2018/MFDC4681.JPG"]
[3, "Open", "19:23:17.154510"]
[3, "Numpy", "19:23:17.156576"]
[3, "Expand", "19:23:19.012935"]
[3, "Detect", "19:23:19.013013"]
[3, "Export", "19:23:20.323839"]
[3, "Done", "19:23:20.324307"]
[4, "Image", "test_images/DE4T_11Jan2018/MFDC4697.JPG"]
[4, "Open", "19:23:20.324791"]
[4, "Numpy", "19:23:20.327136"]
[4, "Expand", "19:23:22.175578"]
[4, "Detect", "19:23:22.175658"]
[4, "Export", "19:23:23.472040"]
[4, "Done", "19:23:23.472297"]

在我的情况下,我确实预先加载了会话和图形,但我仍然无法充分利用。 @KLH,你能否请指导我做了什么其他工作? - Pramesh Bajracharya
2个回答

1

1) 你可以直接加载视频而不是图像,然后将 "run_inference_for_single_image()" 更改为创建会话一次并在其中加载图像/视频(重新创建图形非常慢)。此外,您可以编辑管道配置文件以减少提案数量,这将直接加速推理。请注意,之后必须重新导出图形(https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md)。批处理也有帮助(虽然很抱歉,我忘记了多少),最后,您可以使用多处理来卸载 CPU 特定操作(绘制边界框,加载数据)以更好地利用 GPU。

2) 将图像转换为 numpy(处理前)需要 1.5 秒是否过时 <- 是的,这非常缓慢,有很大的改进空间。

3) 虽然我不知道 AWS 上的确切 GPU(k80?),但您应该能够在所有修复程序下获得超过 10fps 的 geforce 1080TI,这与他们报告的 79ms 时间相符(您从哪里得到 faster-rcnn_resnet_101 的 20ms?)


谢谢您的建议。我可能有些误导,我使用的是所有静态图像而不是视频。 - KLH
你说得对,我错过了每次推理都需要重新加载图形的事实。我相信我已经重写了我的代码,所以我可以捕捉到推理而不是加载图形。以下行在AWS g3X4large实例上运行大约需要1.5秒钟。除非我漏掉了什么,否则在此时图表已经被加载。我意识到将图像转换为numpy的1.8秒也包括从文件加载以及reshape......我假设这就是时间的原因。 - KLH
我也在20毫秒的时间上弄错了...把数字搞混了...你说的报告时间是79毫秒是正确的。不过,我离那个时间还有很长的路要走。 - KLH
也许先将图像加载到内存中,然后测量批次的检测?run_inference_for_single_image(image,graph): 尝试调整 with graph.as_default(): with tf.Session() as sess: 看看双重with是否会减慢代码 - Long Hoang Nguyen

0

您也可以尝试使用OpenVINO来提高推理性能。它通过图剪枝和融合一些操作来优化推理时间。OpenVINO针对英特尔硬件进行了优化,但应该可以与任何CPU(甚至云端)一起使用。

这里有一些Faster RCNN Resnet模型和各种CPU的性能基准。

将Tensorflow模型转换为OpenVINO相当简单,除非您有花哨的自定义层。完整的教程可以在这里找到。以下是一些片段。

安装OpenVINO

最简单的方法是使用PIP。或者,您可以使用this tool来找到适合您情况的最佳方法。

pip install openvino-dev[tensorflow2]

使用模型优化器将SavedModel模型转换

模型优化器是来自OpenVINO开发包的命令行工具。它将Tensorflow模型转换为IR格式,这是OpenVINO的默认格式。您还可以尝试FP16精度,这应该可以在不显著降低准确性的情况下提供更好的性能(只需更改数据类型)。在命令行中运行:

mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"

运行推断

转换后的模型可以由运行时加载并编译为特定设备,例如CPU或GPU(集成到您的CPU,如英特尔HD Graphics)。如果您不知道什么是最佳选择,请使用AUTO。

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

甚至有一个名为[OpenVINO Model Server][5]的工具,它与Tensorflow Serving非常相似。

免责声明:我在OpenVINO上工作。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接