MNIST Training, Compilation and Deployment with MXNet Module and Sagemaker Neo

The SageMaker Python SDK makes it easy to train, compile and deploy MXNet models. In this example, we train a simple neural network using the Apache MXNet Module API and the MNIST dataset. The MNIST dataset is widely used for handwritten digit classification, and consists of 70,000 labeled 28x28 pixel grayscale images of hand-written digits. The dataset is split into 60,000 training images and 10,000 test images. There are 10 classes (one for each of the 10 digits). The task at hand is to train a model using the 60,000 training images, compile the trained model using SageMaker Neo and subsequently test its classification accuracy on the 10,000 test images.


To get started, we need to first upgrade the SageMaker SDK for Python to v2.33.0 or greater & restart the kernel. Then we create a session and define a few variables that will be needed later in the example.

[ ]:
!~/anaconda3/envs/mxnet_p36/bin/pip install --upgrade sagemaker
[ ]:
import sagemaker
from sagemaker import get_execution_role
from sagemaker.session import Session

# S3 bucket and folder for saving code and model artifacts.
# Feel free to specify a different bucket/folder here if you wish.
bucket = Session().default_bucket()
folder = "DEMO-MXNet-MNIST"

# Location to save your custom code in tar.gz format.
custom_code_upload_location = "s3://{}/{}/custom-code".format(bucket, folder)

# Location where results of model training are saved.
s3_training_output_location = "s3://{}/{}/training-output".format(bucket, folder)

# Location where results of model compilation are saved.
s3_compilation_output_location = "s3://{}/{}/compilation-output".format(bucket, folder)

# IAM execution role that gives SageMaker access to resources in your AWS account.
# We can use the SageMaker Python SDK to get the role from our notebook environment.
role = get_execution_role()

Entry Point Script

The script provides all the code we need for training and hosting a SageMaker model. The script we will use is adapted from Apache MXNet MNIST tutorial.

[ ]:

In the training script, there are two additional functions, to be used with Neo: * model_fn(): Loads the compiled model and runs a warm-up inference on a valid empty data * transform_fn(): Converts incoming payload into NumPy array, performs prediction & converts the prediction output into response payload * Alternatively, instead of transform_fn(), these three can be defined: input_fn(), predict_fn() and output_fn()

Creating SageMaker’s MXNet estimator

The SageMaker MXNet estimator allows us to run single machine or distributed training in SageMaker, using CPU or GPU-based instances.

When we create the estimator, we pass in the filename of our training script as the entry_point, the name of our IAM execution role, and the S3 locations we defined in the setup section. We also provide instance_count and instance_type which allows to specify the number and type of SageMaker instances that will be used for the training job. The hyperparameters parameter is a dict of values that will be passed to your training script – you can see how to access these values in the script above.

For this example, we will choose one ml.c5.4xlarge instance.

[ ]:
from sagemaker.mxnet import MXNet

mnist_estimator = MXNet(
    distribution={"parameter_server": {"enabled": True}},
    hyperparameters={"learning-rate": 0.1},

Running the Training Job

After we’ve constructed our MXNet object, we can fit it using data stored in S3. Below we run SageMaker training on two input channels: train and test. During training, SageMaker makes this data stored in S3 available in the local filesystem where the script is running. The script loads the train and test data from disk.

[ ]:
import boto3

region = boto3.Session().region_name
train_data_location = "s3://sagemaker-sample-data-{}/mxnet/mnist/train".format(region)
test_data_location = "s3://sagemaker-sample-data-{}/mxnet/mnist/test".format(region){"train": train_data_location, "test": test_data_location})

Optimizing the trained model with SageMaker Neo

Neo API allows to optimize the model for a specific hardware type. When calling compile_model() function, we specify the target instance family, correct input shapes for the model, the name of our IAM execution role, S3 bucket to which the compiled model would be stored and we set MMS_DEFAULT_RESPONSE_TIMEOUT to 500. For this example, we will choose ml_c5 as the target instance family.

Important: If the following command result in a permission error, scroll up and locate the value of execution role returned by ``get_execution_role()``. The role must have access to the S3 bucket specified in ``output_path``.

[ ]:
compiled_model = mnist_estimator.compile_model(
    input_shape={"data": [1, 28, 28]},

Creating an inference Endpoint

We can deploy this compiled model using the deploy() function, for which we need to use an instance_type belonging to the target_instance_family we used for compilation. For this example, we will choose ml.c5.4xlarge instance as we compiled for ml_c5. The function also allow us to set the number of initial_instance_count that will be used for the Endpoint. We also pass NumpySerializer() whose CONTENT_TYPE is application/x-npy which thereby ensure that the endpoint will receive NumPy array as the payload during inference. The deploy() function creates a SageMaker endpoint that we can use to perform inference.

Note: If you compiled the model for a GPU target_instance_family then please make sure to deploy to one of the same target instance_type below and also make necessary changes in

[ ]:
from sagemaker.serializers import NumpySerializer

serializer = NumpySerializer()
predictor = compiled_model.deploy(
    initial_instance_count=1, instance_type="ml.c5.4xlarge", serializer=serializer

Making an inference request

Now that our Endpoint is deployed and we have a predictor object, we can use it to classify handwritten digits.

To see inference in action, we load the input.npy file which was generated using script provided and has the data equivalent of a hand drawn digit 0. If you would like to draw a different digit and generate a new input.npy file then you can do so by running the script provided. A GUI enabled device would be required to run the script which will generate input.npy file once a digit is drawn.

[ ]:
import numpy as np

numpy_ndarray = np.load("input.npy")

Now we can use the predictor object to classify the handwritten digit.

[ ]:
response = predictor.predict(data=numpy_ndarray)
print("Raw prediction result:")

labeled_predictions = list(zip(range(10), response))
print("Labeled predictions: ")

labeled_predictions.sort(key=lambda label_and_prob: 1.0 - label_and_prob[1])
print("Most likely answer: {}".format(labeled_predictions[0]))

(Optional) Delete the Endpoint

After you have finished with this example, remember to delete the prediction endpoint to release the instance(s) associated with it.

[ ]:
print("Endpoint name: " + predictor.endpoint_name)