Run a SageMaker Experiment with MNIST Handwritten Digits Classification

This demo shows how you can use the SageMaker Experiments Python SDK to organize, track, compare, and evaluate your machine learning (ML) model training experiments.

You can track artifacts for experiments, including data sets, algorithms, hyperparameters, and metrics. Experiments executed on SageMaker such as SageMaker Autopilot jobs and training jobs are automatically tracked. You can also track artifacts for additional steps within an ML workflow that come before or after model training, such as data pre-processing or post-training model evaluation.

The APIs also let you search and browse your current and past experiments, compare experiments, and identify best-performing models.

We demonstrate these capabilities through an MNIST handwritten digits classification example. The experiment is organized as follows:

  1. Download and prepare the MNIST dataset.

  2. Train a Convolutional Neural Network (CNN) Model. Tune the hyperparameter that configures the number of hidden channels in the model. Track the parameter configurations and resulting model accuracy using the SageMaker Experiments Python SDK.

  3. Finally use the search and analytics capabilities of the SDK to search, compare and evaluate the performance of all model versions generated from model tuning in Step 2.

  4. We also show an example of tracing the complete lineage of a model version: the collection of all the data pre-processing and training configurations and inputs that went into creating that model version.

Make sure you select the Python 3 (Data Science) kernel in Studio, or conda_pytorch_p36 in a notebook instance.

Runtime

This notebook takes approximately 25 minutes to run.

Contents

  1. Install modules

  2. Setup

  3. Download the dataset

  4. Step 1: Set up the Experiment

  5. Step 2: Track Experiment

  6. Deploy an endpoint for the best training job / trial component

  7. Cleanup

  8. Contact

Install modules

[ ]:
import sys

Install the SageMaker Experiments Python SDK

[ ]:
!{sys.executable} -m pip install sagemaker-experiments==0.1.35

Install PyTorch

[ ]:
# PyTorch version needs to be the same in both the notebook instance and the training job container
# https://github.com/pytorch/pytorch/issues/25214
!{sys.executable} -m pip install torch==1.1.0
!{sys.executable} -m pip install torchvision==0.2.2
!{sys.executable} -m pip install pillow==6.2.2
!{sys.executable} -m pip install --upgrade sagemaker

Setup

[ ]:
import time

import boto3
import numpy as np
import pandas as pd
from IPython.display import set_matplotlib_formats
from matplotlib import pyplot as plt
from torchvision import datasets, transforms

import sagemaker
from sagemaker import get_execution_role
from sagemaker.session import Session
from sagemaker.analytics import ExperimentAnalytics

from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker

set_matplotlib_formats("retina")
[ ]:
sm_sess = sagemaker.Session()
sess = sm_sess.boto_session
sm = sm_sess.sagemaker_client
role = get_execution_role()
region = sess.region_name

Download the dataset

We download the MNIST handwritten digits dataset, and then apply a transformation on each image.

[ ]:
bucket = sm_sess.default_bucket()
prefix = "DEMO-mnist"
print("Using S3 location: s3://" + bucket + "/" + prefix + "/")

datasets.MNIST.urls = [
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/train-images-idx3-ubyte.gz",
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/train-labels-idx1-ubyte.gz",
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-images-idx3-ubyte.gz",
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-labels-idx1-ubyte.gz",
]

# Download the dataset to the ./mnist folder, and load and transform (normalize) them
train_set = datasets.MNIST(
    "mnist",
    train=True,
    transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
    ),
    download=True,
)

test_set = datasets.MNIST(
    "mnist",
    train=False,
    transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
    ),
    download=False,
)

View an example image from the dataset.

[ ]:
plt.imshow(train_set.data[2].numpy())

After transforming the images in the dataset, we upload it to S3.

[ ]:
inputs = sagemaker.Session().upload_data(path="mnist", bucket=bucket, key_prefix=prefix)

Now let’s track the parameters from the data pre-processing step.

[ ]:
with Tracker.create(display_name="Preprocessing", sagemaker_boto_client=sm) as tracker:
    tracker.log_parameters(
        {
            "normalization_mean": 0.1307,
            "normalization_std": 0.3081,
        }
    )
    # We can log the S3 uri to the dataset we just uploaded
    tracker.log_input(name="mnist-dataset", media_type="s3/uri", value=inputs)

Step 1: Set up the Experiment

Create an experiment to track all the model training iterations. Experiments are a great way to organize your data science work. You can create experiments to organize all your model development work for: [1] a business use case you are addressing (e.g. create experiment named “customer churn prediction”), or [2] a data science team that owns the experiment (e.g. create experiment named “marketing analytics experiment”), or [3] a specific data science and ML project. Think of it as a “folder” for organizing your “files”.

Create an Experiment

[ ]:
mnist_experiment = Experiment.create(
    experiment_name=f"mnist-hand-written-digits-classification-{int(time.time())}",
    description="Classification of mnist hand-written digits",
    sagemaker_boto_client=sm,
)
print(mnist_experiment)

Step 2: Track Experiment

Now create a Trial for each training run to track its inputs, parameters, and metrics.

While training the CNN model on SageMaker, we experiment with several values for the number of hidden channel in the model. We create a Trial to track each training job run. We also create a TrialComponent from the tracker we created before, and add to the Trial. This enriches the Trial with the parameters we captured from the data pre-processing stage.

[ ]:
from sagemaker.pytorch import PyTorch, PyTorchModel
[ ]:
hidden_channel_trial_name_map = {}

If you want to run the following five training jobs in parallel, you may need to increase your resource limit. Here we run them sequentially.

[ ]:
preprocessing_trial_component = tracker.trial_component
[ ]:
for i, num_hidden_channel in enumerate([2, 5, 10, 20, 32]):
    # Create trial
    trial_name = f"cnn-training-job-{num_hidden_channel}-hidden-channels-{int(time.time())}"
    cnn_trial = Trial.create(
        trial_name=trial_name,
        experiment_name=mnist_experiment.experiment_name,
        sagemaker_boto_client=sm,
    )
    hidden_channel_trial_name_map[num_hidden_channel] = trial_name

    # Associate the proprocessing trial component with the current trial
    cnn_trial.add_trial_component(preprocessing_trial_component)

    # All input configurations, parameters, and metrics specified in
    # the estimator definition are automatically tracked
    estimator = PyTorch(
        py_version="py3",
        entry_point="./mnist.py",
        role=role,
        sagemaker_session=sagemaker.Session(sagemaker_client=sm),
        framework_version="1.1.0",
        instance_count=1,
        instance_type="ml.c4.xlarge",
        hyperparameters={
            "epochs": 2,
            "backend": "gloo",
            "hidden_channels": num_hidden_channel,
            "dropout": 0.2,
            "kernel_size": 5,
            "optimizer": "sgd",
        },
        metric_definitions=[
            {"Name": "train:loss", "Regex": "Train Loss: (.*?);"},
            {"Name": "test:loss", "Regex": "Test Average loss: (.*?),"},
            {"Name": "test:accuracy", "Regex": "Test Accuracy: (.*?)%;"},
        ],
        enable_sagemaker_metrics=True,
    )

    cnn_training_job_name = "cnn-training-job-{}".format(int(time.time()))

    # Associate the estimator with the Experiment and Trial
    estimator.fit(
        inputs={"training": inputs},
        job_name=cnn_training_job_name,
        experiment_config={
            "TrialName": cnn_trial.trial_name,
            "TrialComponentDisplayName": "Training",
        },
        wait=True,
    )

    # Wait two seconds before dispatching the next training job
    time.sleep(2)

Compare the model training runs for an experiment

Now we use the analytics capabilities of the Experiments SDK to query and compare the training runs for identifying the best model produced by our experiment. You can retrieve trial components by using a search expression.

Some Simple Analyses

[ ]:
search_expression = {
    "Filters": [
        {
            "Name": "DisplayName",
            "Operator": "Equals",
            "Value": "Training",
        }
    ],
}
[ ]:
trial_component_analytics = ExperimentAnalytics(
    sagemaker_session=Session(sess, sm),
    experiment_name=mnist_experiment.experiment_name,
    search_expression=search_expression,
    sort_by="metrics.test:accuracy.max",
    sort_order="Descending",
    metric_names=["test:accuracy"],
    parameter_names=["hidden_channels", "epochs", "dropout", "optimizer"],
)
[ ]:
trial_component_analytics.dataframe()

To isolate and measure the impact of change in hidden channels on model accuracy, we vary the number of hidden channel and fix the value for other hyperparameters.

Next let’s look at an example of tracing the lineage of a model by accessing the data tracked by SageMaker Experiments for the cnn-training-job-2-hidden-channels trial.

[ ]:
lineage_table = ExperimentAnalytics(
    sagemaker_session=Session(sess, sm),
    search_expression={
        "Filters": [
            {
                "Name": "Parents.TrialName",
                "Operator": "Equals",
                "Value": hidden_channel_trial_name_map[2],
            }
        ]
    },
    sort_by="CreationTime",
    sort_order="Ascending",
)
[ ]:
lineage_table.dataframe()

Push best training job model to model registry

Now we take the best model and push it to model registry.

Step 1: Create a model package group.

[ ]:
import time

model_package_group_name = "mnist-handwritten-digit-classification" + str(round(time.time()))
model_package_group_input_dict = {
    "ModelPackageGroupName": model_package_group_name,
    "ModelPackageGroupDescription": "Sample model package group",
}

create_model_package_group_response = sm.create_model_package_group(
    **model_package_group_input_dict
)
model_package_arn = create_model_package_group_response["ModelPackageGroupArn"]

print(f"ModelPackageGroup Arn : {model_package_arn}")
[ ]:
model_package_arn

Step 2: Get the best model training job from SageMaker experiments API

[ ]:
best_trial_component_name = trial_component_analytics.dataframe().iloc[0]["TrialComponentName"]
best_trial_component = TrialComponent.load(best_trial_component_name)
[ ]:
best_trial_component.trial_component_name

Step 3: Register the best model.

By default, the model is registered with the approval_status set to PendingManualApproval. Users can then use API to manually approve the model based on any criteria set for model evaluation.

[ ]:
# create model object
model_data = best_trial_component.output_artifacts["SageMaker.ModelArtifact"].value
env = {
    "hidden_channels": str(int(best_trial_component.parameters["hidden_channels"])),
    "dropout": str(best_trial_component.parameters["dropout"]),
    "kernel_size": str(int(best_trial_component.parameters["kernel_size"])),
}
model = PyTorchModel(
    model_data,
    role,
    "./mnist.py",
    py_version="py3",
    env=env,
    sagemaker_session=sagemaker.Session(sagemaker_client=sm),
    framework_version="1.1.0",
    name=best_trial_component.trial_component_name,
)
[ ]:
model_package = model.register(
    content_types=["*"],
    response_types=["application/json"],
    inference_instances=["ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    description="MNIST image classification model",
    approval_status="PendingManualApproval",
    model_package_group_name=model_package_group_name,
)

Step 4: Verify model has been registered.

[ ]:
sm.describe_model_package_group(ModelPackageGroupName=model_package_group_name)
[ ]:
## check model version
sm.list_model_packages(ModelPackageGroupName=model_package_group_name)
[ ]:
model_package_arn = sm.list_model_packages(ModelPackageGroupName=model_package_group_name)[
    "ModelPackageSummaryList"
][0]["ModelPackageArn"]
[ ]:
### Update the model status to approved
model_package_update_input_dict = {
    "ModelPackageArn": model_package_arn,
    "ModelApprovalStatus": "Approved",
}
model_package_update_response = sm.update_model_package(**model_package_update_input_dict)

Deploy an endpoint for the lastest approved version of the model from model registry

Now we take the best model and deploy it to an endpoint so it is available to perform inference.

[ ]:
from datetime import datetime

now = datetime.now()
time = now.strftime("%m-%d-%Y-%H-%M-%S")
print("time:", time)
endpoint_name = f"cnn-mnist-{time}"
endpoint_name
[ ]:
model_package.deploy(
    initial_instance_count=1, instance_type="ml.m5.xlarge", endpoint_name=endpoint_name
)

Cleanup

Once we’re done, clean up the endpoint to prevent unnecessary billing.

[ ]:
sagemaker_client = boto3.client("sagemaker", region_name=region)
# Delete endpoint
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
[ ]:
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_name)

Trial components can exist independently of trials and experiments. You might want keep them if you plan on further exploration. If not, delete all experiment artifacts.

[ ]:
mnist_experiment.delete_all(action="--force")

Contact

Submit any questions or issues to https://github.com/aws/sagemaker-experiments/issues or mention @aws/sagemakerexperimentsadmin