Deploy pre-trained GluonCV SSD Mobilenet model with SageMaker Neo

  1. Introduction

  2. Setup

    1. Import SSD Mobilenet model from MXNet GluonCV

    2. Upload model to S3

    3. Use sagemaker MXNetModel to load pretrained MXNet model

  3. Compile the pre-trained model using SageMaker Neo

  4. Deploy-the-compiled-model-and-request-Inferences

  5. Delete the Endpoint


This example demonstrates how to load a pre-trained MXNet GluonCV SSD model, optimize the trained model using SageMaker Neo, and host the model.


To compile and deploy the ssd mobilenet model on Amazon SageMaker, we need to setup and authenticate the use of AWS services.

To start, we need to upgrade the SageMaker SDK for Python to v2.33.0 or greater and latest MXNet GluonCV and restart the kernel.

[ ]:
!~/anaconda3/envs/mxnet_p36/bin/pip install --upgrade sagemaker>=2.33.0 gluoncv

Then we need an AWS account role with SageMaker access. This role is used to give SageMaker access to your data in S3. We also create a session.

[ ]:
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
sess = sagemaker.Session()

We then need an S3 bucket that would be used for storing the model artifacts generated after training and compilation, training data and custom code.

[ ]:
# S3 bucket and folders for saving code and model artifacts.
# Feel free to specify different bucket/folders here if you wish.
bucket = sess.default_bucket()
folder = "DEMO-ObjectDetection-SSD-MobileNet"
pretrained_model_sub_folder = folder + "/pretrained-model"
compilation_output_sub_folder = folder + "/compilation-output"

To easily visualize the detection outputs we also define the following function. The function visualizes the high-confidence predictions with bounding box by filtering out low-confidence detections.

[ ]:
%matplotlib inline
def visualize_detection(img_file, dets, classes=[], thresh=0.6):
    visualize detections in one image
    img_file : numpy.array
        image, in bgr format
    dets : numpy.array
        ssd detections, numpy.array([[id, score, x1, y1, x2, y2]...])
        each row is one object
    classes : tuple or list of str
        class names
    thresh : float
        score threshold
    import random
    import matplotlib.pyplot as plt
    import matplotlib.image as mpimg
    from matplotlib.patches import Rectangle

    img = mpimg.imread(img_file)
    height = img.shape[0]
    width = img.shape[1]
    colors = dict()
    klasses = dets[0][0]
    scores = dets[1][0]
    bbox = dets[2][0]
    for i in range(len(classes)):
        klass = klasses[i][0]
        score = scores[i][0]
        x0, y0, x1, y1 = bbox[i]
        if score < thresh:
        cls_id = int(klass)
        if cls_id not in colors:
            colors[cls_id] = (random.random(), random.random(), random.random())
        xmin = int(x0 * width / 512)
        ymin = int(y0 * height / 512)
        xmax = int(x1 * width / 512)
        ymax = int(y1 * height / 512)
        rect = Rectangle(
            (xmin, ymin),
            xmax - xmin,
            ymax - ymin,
        class_name = str(cls_id)
        if classes and len(classes) > cls_id:
            class_name = classes[cls_id]
            ymin - 2,
            "{:s} {:.3f}".format(class_name, score),
            bbox=dict(facecolor=colors[cls_id], alpha=0.5),
    plt.tight_layout(rect=[0, 0, 2, 2])
[ ]:
# Initializing object categories
object_categories = [

# Setting a threshold 0.20 will only plot detection results that have a confidence score greater than 0.20
threshold = 0.20

Finally, we load the test image into the memory. The test image used in this notebook is from PEXELS which remains unseen until the time of prediction.

[ ]:
import PIL.Image
import numpy as np

test_file = "test.jpg"
test_image =
test_image = np.asarray(test_image.resize((512, 512)))

Import SSD Mobilenet model from MXNet GluonCV

This example uses pre-trained MXNet GluonCV SSD model initially published in: > Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.

[ ]:
import numpy as np
import mxnet as mx
import gluoncv as gcv
import tarfile

net = gcv.model_zoo.get_model("ssd_512_mobilenet1.0_voc", pretrained=True)
net(mx.nd.ones((1, 3, 512, 512)))
tar ="ssd_512_mobilenet1.0_voc.tar.gz", "w:gz")

for name in ["model-0000.params", "model-symbol.json"]:

Upload model to S3

Upload the pre-trained model to the S3 bucket.

[ ]:
pretrained_model_path = sess.upload_data(
    path="ssd_512_mobilenet1.0_voc.tar.gz", bucket=bucket, key_prefix=pretrained_model_sub_folder

Next, we need to setup training and compilation output locations in S3, where the respective model artifacts will be dumped. We also setup the s3 location for training data and custom code.

[ ]:
# S3 Location to save the model artifact after training
s3_pretrained_model_location = "s3://{}/{}".format(bucket, pretrained_model_sub_folder)

# S3 Location to save the model artifact after compilation
s3_compilation_output_location = "s3://{}/{}".format(bucket, compilation_output_sub_folder)

Use sagemaker MXNetModel to load pretrained MXNet model

When loading the model, user is expected to provide the entry_point script required by the model. We set MMS_DEFAULT_RESPONSE_TIMEOUT environment variable to 500 for MXNet model.

[ ]:
from sagemaker.mxnet.model import MXNetModel
from sagemaker.mxnet import MXNetPredictor

pre_trained_model = MXNetModel(

Compile the pre-trained model using SageMaker Neo

After loading the pretrained model we can use SageMaker Neo’s compile() API to compile the pretrained model. When calling compile(), the user is expected to provide all the correct input shapes required by the model for successful compilation. We also specify the target instance family, the name of our IAM execution role, S3 bucket to which the compiled model would be stored.

For this example, we will choose ml_p3 as the target instance family while compiling the trained model.

[ ]:
import time

compiled_model = pre_trained_model.compile(
    input_shape={"data": [1, 3, 512, 512]},

Deploy the compiled model and request Inferences

We have to deploy the compiled model within the instance family for which the trained model was compiled. Since we have compiled for ml_p3 we can deploy to any ml.p3 instance type. For this example we will choose ml.p3.2xlarge

[ ]:
neo_object_detector = compiled_model.deploy(initial_instance_count=1, instance_type="ml.p3.2xlarge")
[ ]:
response = neo_object_detector.predict(test_image)
[ ]:
# Visualize the detections.
visualize_detection(test_file, response, object_categories, threshold)

Delete the Endpoint

Having an endpoint running will incur some costs. Therefore, as an optional clean-up job, you can delete it.

[ ]:
print("Endpoint name: " + neo_object_detector.endpoint_name)