Model Optimization with an Image Classification Example

  1. Introduction

  2. Prerequisites and Preprocessing

  3. Train the model

  4. Optimize trained model using SageMaker Neo and Deploy

  5. Request Inference

  6. Delete the Endpoints


Welcome to our model optimization example for image classification. In this demo, we will use the Amazon SageMaker Image Classification algorithm to train on the Caltech-256 dataset and then we will demonstrate Amazon SageMaker Neo’s ability to optimize models.

Prequisites and Preprocessing


To get started, we need to define a few variables and obtain certain permissions that will be needed later in the example. These are: * A SageMaker session * IAM role to give learning, storage & hosting access to your data * An S3 bucket, a folder & sub folders that will be used to store data and artifacts * SageMaker’s specific Image Classification training image which should not be changed

We also need to upgrade the SageMaker SDK for Python to v2.33.0 or greater and restart the kernel.

[ ]:
!~/anaconda3/envs/mxnet_p36/bin/pip install --upgrade sagemaker>=2.33.0
[ ]:
import sagemaker
from sagemaker import session, get_execution_role

role = get_execution_role()
sagemaker_session = session.Session()
[ ]:
# S3 bucket and folders for saving code and model artifacts.
# Feel free to specify different bucket/folders here if you wish.
bucket = sagemaker_session.default_bucket()
folder = "DEMO-ImageClassification"
model_with_custom_code_sub_folder = folder + "/model-with-custom-code"
validation_data_sub_folder = folder + "/validation-data"
training_data_sub_folder = folder + "/training-data"
training_output_sub_folder = folder + "/training-output"
compilation_output_sub_folder = folder + "/compilation-output"
[ ]:
from sagemaker import session, get_execution_role
from import get_image_uri

# S3 Location to save the model artifact after training
s3_training_output_location = "s3://{}/{}".format(bucket, training_output_sub_folder)

# S3 Location to save the model artifact after compilation
s3_compilation_output_location = "s3://{}/{}".format(bucket, compilation_output_sub_folder)

# S3 Location to save your custom code in tar.gz format
s3_model_with_custom_code_location = "s3://{}/{}".format(bucket, model_with_custom_code_sub_folder)
[ ]:
from sagemaker.image_uris import retrieve

aws_region = sagemaker_session.boto_region_name
training_image = retrieve(
    framework="image-classification", region=aws_region, image_scope="training"

Data preparation

In this demo, we are using Caltech-256 dataset, pre-converted into RecordIO format using MXNet’s im2rec tool. Caltech-256 dataset contains 30608 images of 256 objects. For the training and validation data, the splitting scheme followed is governed by this MXNet example. The example randomly selects 60 images per class for training, and uses the remaining data for validation. It takes around 50 seconds to convert the entire Caltech-256 dataset (~1.2GB) into RecordIO format on a p2.xlarge instance. SageMaker’s training algorithm takes RecordIO files as input. For this demo, we will download the RecordIO files and upload it to S3. We then initialize the 256 object categories as well to a variable.

[ ]:
import os
import urllib.request

def download(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.request.urlretrieve(url, filename)
[ ]:
# Dowload caltech-256 data files from MXNet's website

# Upload the file to S3
s3_training_data_location = sagemaker_session.upload_data(
    "caltech-256-60-train.rec", bucket, training_data_sub_folder
s3_validation_data_location = sagemaker_session.upload_data(
    "caltech-256-60-val.rec", bucket, validation_data_sub_folder
[ ]:
class_labels = [

Train the model

Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a sagemaker.estimator.Estimator object. This estimator is required to launch the training job.

We specify the following parameters while creating the estimator:

  • image_uri: This is set to the training_image uri we defined previously. Once set, this image will be used later while running the training job.

  • role: This is the IAM role which we defined previously.

  • instance_count: This is the number of instances on which to run the training. When the number of instances is greater than one, then the image classification algorithm will run in distributed settings.

  • instance_type: This indicates the type of machine on which to run the training. For this example we will use ml.p3.8xlarge.

  • volume_size: This is the size in GB of the EBS volume to use for storing input data during training. Must be large enough to store training data as File Mode is used.

  • max_run: This is the timeout value in seconds for training. After this amount of time SageMaker terminates the job regardless of its current status.

  • input_mode: This is set to File in this example. SageMaker copies the training dataset from the S3 location to a local directory.

  • output_path: This is the S3 path in which the training output is stored. We are assigning it to s3_training_output_location defined previously.

[ ]:
ic_estimator = sagemaker.estimator.Estimator(

Following are certain hyperparameters that are specific to the algorithm which are also set:

  • num_layers: The number of layers (depth) for the network. We use 18 in this samples but other values such as 50, 152 can be used.

  • image_shape: The input image dimensions,’num_channels, height, width’, for the network. It should be no larger than the actual image size. The number of channels should be same as the actual image.

  • num_classes: This is the number of output classes for the new dataset. Imagenet was trained with 1000 output classes but the number of output classes can be changed for fine-tuning. For caltech, we use 257 because it has 256 object categories + 1 clutter class.

  • num_training_samples: This is the total number of training samples. It is set to 15240 for caltech dataset with the current split.

  • mini_batch_size: The number of training samples used for each mini batch. In distributed training, the number of training samples used per batch will be N * mini_batch_size where N is the number of hosts on which training is run.

  • epochs: Number of training epochs.

  • learning_rate: Learning rate for training.

  • top_k: Report the top-k accuracy during training.

  • precision_dtype: Training datatype precision (default: float32). If set to ‘float16’, the training will be done in mixed_precision mode and will be faster than float32 mode.

[ ]:

Next we setup the input data_channels to be used later for training.

[ ]:
train_data = sagemaker.inputs.TrainingInput(
    s3_training_data_location, content_type="application/x-recordio", s3_data_type="S3Prefix"

validation_data = sagemaker.inputs.TrainingInput(
    s3_validation_data_location, content_type="application/x-recordio", s3_data_type="S3Prefix"

data_channels = {"train": train_data, "validation": validation_data}

After we’ve created the estimator object, we can train the model using fit() API

[ ]:, logs=True)

Optimize trained model using SageMaker Neo and Deploy

We will use SageMaker Neo’s compile_model() API while specifying MXNet as the framework and the version to optimize the model. When calling this API, we also specify the target instance family, correct input shapes for the model and the S3 location to which the compiled model artifacts would be stored. For this example, we will choose ml_c5 as the target instance family.

[ ]:
optimized_ic = ic_estimator.compile_model(
    input_shape={"data": [1, 3, 224, 224]},

After compiled artifacts are generated and we have a sagemaker.model.Model object, we then create a sagemaker.mxnet.model.MXNetModel object while specifying the following parameters: * model_data: s3 location where compiled model artifact is stored * image_uri: Neo’s Inference Image URI for MXNet * framework_version: set to MXNet’s v1.8.0 * role & sagemaker_session : IAM role and sagemaker session which we defined in the setup * entry_point: points to the entry_point script. In our example the script has SageMaker’s hosting functions implementation * py_version: We are required to set to python version 3 * env: A dict to specify the environment variables. We are required to set MMS_DEFAULT_RESPONSE_TIMEOUT to 500 * code_location: s3 location where repacked model.tar.gz is stored. Repacked tar file consists of compiled model artifacts and entry_point script

[ ]:
from sagemaker.mxnet.model import MXNetModel

optimized_ic_model = MXNetModel(

We can now deploy this sagemaker.mxnet.model.MXNetModel using the deploy() API, for which we need to use an instance_type belonging to the target_instance_family we used for compilation. For this example, we will choose ml.c5.4xlarge instance as we compiled for ml_c5. The API also allow us to set the number of initial_instance_count that will be used for the Endpoint. By default the API will use JSONSerializer() and JSONDeserializer() for sagemaker.mxnet.model.MXNetModel whose CONTENT_TYPE is application/json. The API creates a SageMaker endpoint that we can use to perform inference.

Note: If you compiled the model for a GPU target_instance_family then please make sure to deploy to one of the same target instance_type below and also make necessary changes in the entry point script

[ ]:
optimized_ic_classifier = optimized_ic_model.deploy(
    initial_instance_count=1, instance_type="ml.c5.4xlarge"

Request Inference

Once the endpoint is in InService we can then send a test image test.jpg and get the prediction result from the endpoint using SageMaker’s predict() API. Instead of sending the raw image to the endpoint for prediction we will prepare and send the payload which is in a form acceptable by the API. Upon receiving the prediction result we will print the class label and probability.

[ ]:
import PIL.Image
import numpy as np
from IPython.display import Image

test_file = "test.jpg"
test_image =
payload = np.asarray(test_image.resize((224, 224)))
[ ]:
result = optimized_ic_classifier.predict(payload)
[ ]:
index = np.argmax(result)
print("Result: label - " + class_labels[index] + ", probability - " + str(result[index]))

Delete the Endpoint

Having an endpoint running will incur some costs. Therefore as an optional clean-up job, you can delete it.

[ ]:
print("Endpoint name: " + optimized_ic_classifier.endpoint_name)