Running multi-container endpoints on Amazon SageMaker
This notebook’s CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.
SageMaker multi-container endpoints enable customers to deploy multiple containers to deploy different models on a SageMaker endpoint. The containers can be run in a sequence as an inference pipeline, or each container can be accessed individually by using direct invocation to improve endpoint utilization and optimize costs.
This notebook shows how to create a multi-container endpoint which will host both the PyTorch(>=1.5) model and a TensorFlow(>=2.0) model, on a single endpoint. Here, Direct
invocation behavior of multi-container endpoints is showcased where each model container can be invoked directly rather than being called in a sequence.
This notebook is divided in the following sections:
Pre-requisites
Train a TensorFlow Model in SageMaker
Train a PyTorch Model in SageMaker
Setup Multi-container Endpoint with Direct Invocation
Inference
Clean up
Section 1: Pre-requisites
First, import some necessary libraries and variables. This is the place where the output paths for the models are defined.
[ ]:
import os
import json
import time
import random
import numpy as np
from utils.mnist import mnist_to_numpy, normalize
import random
import matplotlib.pyplot as plt
import boto3
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.pytorch import PyTorch
from sagemaker import get_execution_role
from sagemaker.s3 import S3Downloader
from sagemaker.s3 import S3Uploader
sess = sagemaker.Session()
role = get_execution_role()
bucket = sess.default_bucket()
output_prefix = "/multi-container-endpoint/output"
output_path = "s3://" + bucket + output_prefix
region = sess.boto_region_name
sm_client = sess.sagemaker_client
runtime_sm_client = sess.sagemaker_runtime_client
s3_client = boto3.client("s3")
Dataset
This notebook uses MNIST
dataset. MNIST
is a widely used dataset for handwritten digit classification. It consists of 70,000 labeled 28x28
pixel grayscale images of hand-written digits. The dataset is split into 60,000 training images and 10,000 test images. There are 10 classes (one for each of the 10 digits).
Set up channels for training and testing data
Next, the framework Estimator needs to know where to find the training and testing data. It can be a link to an S3 bucket, or it can be a path in the local file system if local mode is used. For this notebook, download the MNIST
data from a public S3 bucket and upload it to the default bucket created in the first cell.
NOTE: Local mode is not supported in Studio.
[ ]:
import logging
import boto3
from botocore.exceptions import ClientError
# Download training and testing data from a public S3 bucket
def download_from_s3(data_dir="/tmp/data", train=True):
"""Download MNIST dataset and convert it to numpy array
Args:
data_dir (str): directory to save the data
train (bool): download training set
Returns:
None
"""
if not os.path.exists(data_dir):
os.makedirs(data_dir)
if train:
images_file = "train-images-idx3-ubyte.gz"
labels_file = "train-labels-idx1-ubyte.gz"
else:
images_file = "t10k-images-idx3-ubyte.gz"
labels_file = "t10k-labels-idx1-ubyte.gz"
# download objects
s3 = boto3.client("s3")
bucket = f"sagemaker-example-files-prod-{region}"
for obj in [images_file, labels_file]:
key = os.path.join("datasets/image/MNIST", obj)
dest = os.path.join(data_dir, obj)
if not os.path.exists(dest):
s3.download_file(bucket, key, dest)
return
download_from_s3("/tmp/data", True)
download_from_s3("/tmp/data", False)
Create channels for SageMaker Training
The keys of the dictionary channels
are parsed to the training image, and it creates the environment variable SM_CHANNEL_<key name>
.
In this example, SM_CHANNEL_TRAINING
and SM_CHANNEL_TESTING
are created in the training image (checkout how tensorflow/train.py
or pytorch/train.py
to learn how to access these variables). For more information, see: SM_CHANNEL_{channel_name}
[ ]:
# upload to the default bucket
dataset_prefix = "multi-container-endpoint/dataset"
loc = sess.upload_data(path="/tmp/data", bucket=bucket, key_prefix=dataset_prefix)
channels = {"training": loc, "testing": loc}
Now all the pre-requisites are set up it is time to train the models. In the following section, a TensorFlow model is trained on the MNIST
dataset
Section 2: Train a TensorFlow model in SageMaker using the TensorFlow Estimator
The TensorFlow
class allows to run a training script on SageMaker infrastructure in a containerized environment.
It needs the following parameters to set up the environment:
entry_point
: A user defined python file to be used by the training container as the instructions for training. This file is further discussed in the next subsectionrole
: An IAM role to make AWS service requestsinstance_type
: The type of SageMaker instance to run the training script.model_dir
: S3 bucket URI where the checkpoint data and models can be exported to during training (default: None). To disable havingmodel_dir
passed to the training script, setmodel_dir
=Falseinstance_count
: The number of instances needed to run the training job. Multiple instances are needed for distributed trainingoutput_path
: S3 bucket URI to save training output (model artifacts and output files)framework_version
: The version of TensorFlow to use.py_version
: The python version to use
For more information, see the API reference
Implement the entry point for training
The entry point for training is a python script that provides all the code for training a TensorFlow model. It is used by the SageMaker TensorFlow Estimator (TensorFlow
class) as the entry point for running the training job.
Under the hood, SageMaker TensorFlow Estimator downloads a docker image with runtime environments specified by the parameters you used to initiate the estimator class, and it injects the training script into the docker image to be used as the entry point to run the container.
In the rest of the notebook, training image refers to the docker image specified by the Estimator and training container refers to the container that runs the training image.
This means the training script is very similar to a training script that might run outside Amazon SageMaker, but it can access the useful environment variables provided by the training image. Checkout the complete list of environment variables for a complete description of all environment variables your training script can access to.
In this example, the training script at tensorflow/code/train.py
is used as the entry point for the TensorFlow Estimator.
[ ]:
!pygmentize 'tensorflow/code/train.py'
Set hyperparameters
In addition, TensorFlow Estimator allows parsing command line arguments to your training script via hyperparameters
. Note that TensorFlow 2.3.1 version is used for training, the same should be used for inference to avoid any errors.
[ ]:
tf_output_path = output_path + "/tensorflow"
tf_estimator = TensorFlow(
entry_point="train.py",
source_dir="tensorflow/code", # directory of training script
role=role,
framework_version="2.3.1",
model_dir=False, # don't pass --model_dir to training script
py_version="py37",
instance_type="ml.c4.xlarge",
instance_count=1,
output_path=tf_output_path,
hyperparameters={
"batch-size": 512,
"epochs": 1,
"learning-rate": 1e-3,
"beta_1": 0.9,
"beta_2": 0.999,
},
)
Run the training script on SageMaker
Now, the TensorFlow training container has everything to execute the training script, model training can be started by calling fit
method.
[ ]:
tf_estimator.fit(inputs=channels)
Inspect and store model data
Now, the training is finished, the model artifact has been saved in the output_path
.
[ ]:
tf_mnist_model_data = tf_estimator.model_data
print("Model artifact saved at:\n", tf_mnist_model_data)
Section 3: Train a PyTorch model in SageMaker using PyTorch Estimator
In this section, A PyTorch model is trained on the same MNIST
dataset.
PyTorch Estimator
The PyTorch
class allows to run the training script on SageMaker infrastructure in a containerized environment.
It needs to have the following parameters to set up the environment:
entry_point
: A user defined python file to be used by the training container as the instructions for training. This file is further discussed in the next subsection.role
: An IAM role to make AWS service requestsinstance_type
: The type of SageMaker instance to run the training script.instance_count
: The number of instances needed to run the training job. Multiple instances are needed for distributed training.output_path
: S3 bucket URI to save training output (model artifacts and output files)framework_version
: The version of PyTorch to use.py_version
: The python version to use
For more information, see the API reference
Implement the entry point for training
The entry point for training is a python script that provides all the code for training a PyTorch model. It is used by the SageMaker PyTorch Estimator (PyTorch
class above) as the entry point for running the training job.
Under the hood, SageMaker PyTorch Estimator creates a docker image with runtime environments specified by the parameters used to initiate the Estimator class, and it injects the training script into the docker image to be used as the entry point to run the container. Here as well, the training script can access all the useful environment variables provided by the training image as described in Section 2. The training script present at pytorch/code/train.py
is used as the entry point for
the PyTorch Estimator.
[ ]:
!pygmentize 'pytorch/code/train.py'
Set hyperparameters
In addition, PyTorch Estimator allows parsing command line arguments to your training script via hyperparameters
. Note that PyTorch 1.8.1 version is used for training, the same should be used for inference as well to avoid any errors.
[ ]:
pytorch_est = PyTorch(
entry_point="train.py",
source_dir="pytorch/code", # directory of your training script
role=role,
framework_version="1.8.1",
py_version="py3",
instance_type="ml.c4.xlarge",
instance_count=1,
output_path=output_path + "/pytorch",
hyperparameters={"batch-size": 128, "epochs": 1, "learning-rate": 1e-3, "log-interval": 100},
)
Run the training script on SageMaker
Now, the PyTorch training container has everything to execute the training script. The training can be started by calling fit
method.
[ ]:
pytorch_est.fit(inputs=channels)
Inspect and store model data
Now, the training is finished, the model artifact has been saved in the output_path
.
[ ]:
pt_mnist_model_data = pytorch_est.model_data
print("Model artifact saved at:\n", pt_mnist_model_data)
Section 4: Set up Multi-container endpoint with Direct Invocation
In this section, a multi-container endpoint is set up.
SageMaker multi-container endpoints enable customers to deploy multiple containers to deploy different models on the same SageMaker endpoint. The containers can be run in a sequence as an inference pipeline, or each container can be accessed individually by using direct
invocation to improve endpoint utilization and optimize costs.
The TensorFlow and PyTorch models, trained in the earlier sections would be deployed against a single sagemaker endpoint using multi-container capability of SageMaker Endpoints. This section usesboto3
APIs.
Setting up a multi-container endpoint is a multi-step process, which looks like the following: - Create inference container definitions for all the containers needed to deploy - Create a SageMaker model using the create_model
API. Use the Containers
parameter instead of PrimaryContainer
, and include more than one container in the Containers
parameter. - Create a SageMaker Endpoint Configuration using the create_endpoint_config
API - Create a SageMaker Endpoint using the
create_endpoint
API which uses the model and endpoint configuration created in the earlier steps.
Create inference container definition for TensorFlow model
To create a container definition, following must be defined :
ContainerHostname
: The value of the parameter uniquely identifies the container for the purposes of logging and metrics. TheContainerHostname
parameter is required for each container in a multi-container endpoint withdirect
invocation. Though it can be skipped, in case of serial inference pipeline as the inference pipeline will assign a unique name automatically.Image
: It is the path where inference code is stored. This can be either in Amazon EC2 Container Registry or in a Docker registry that is accessible from the same VPC that is configured for the endpoint. If custom algorithm is used instead of an algorithm provided by Amazon SageMaker, the inference code must meet Amazon SageMaker requirements.ModelDataUrl
: The S3 path where the model artifacts, which result from model training, are stored. This path must point to a single GZIP compressed tar archive (.tar.gz
suffix). The S3 path is required for Amazon SageMaker built-in algorithms/frameworks, but not if a custom algorithm (not provided by sagemaker) is used.
For the Image argument, supply the ECR path of the TensorFlow 2.3.1 inference image. For deep learning images available in SageMaker, refer to Available Deep Learning Containers Images.
[ ]:
tf_ecr_image_uri = sagemaker.image_uris.retrieve(
framework="tensorflow",
region=region,
version="2.3.1",
py_version="py37",
instance_type="ml.c5.4xlarge",
image_scope="inference",
)
tensorflow_container = {
"ContainerHostname": "tensorflow-mnist",
"Image": tf_ecr_image_uri,
"ModelDataUrl": tf_mnist_model_data,
}
Create inference container definition for PyTorch model
Now similarly, create the container definition for PyTorch model.
Here in addition to the arguments defined for TensorFlow container, one more additional argument needs to be defined which is Environment
. This is because, the PyTorch model server needs to know how to load the model and make the predictions. This is explained in detail in the following section.
To tell the inference image how to load the model checkpoint, it needs to implement:
How to parse the incoming request
How to use the trained model to make inference
How to return the prediction to the caller of the service
To achieve this, it needs to:
implement a function called
model_fn
which returns a PyTorch model.implement a function called
input_fn
function which handles data decoding and returns an object that can be passed topredict_fn
implement a function called
predict_fn
function which will perform the prediction and returns as object that can be passed tooutput_fn
implement a function called
output_fn
function which will perform the de-serialization of the output given bypredict_fn
To achieve this, inference.py
is created which provides the implementation of all the above functions in that file. This file must be supplied as an environment variable SAGEMAKER_PROGRAM
.
The model and inference.py
also need to be wrapped together in a single tar.gz
. The following steps are performed to zip the inference and model file together:
Download the
model.tar.gz
containing the trained PyTorch modelUnzip the
model.tar.gz
. Themodel.pth
file is visible after unzipping.GZIP the
model file(.pth)
and theinference.py
together in a newtar.gz
Upload the new
tar.gz
tos3
location, to be referred in themodel container definition
later
[ ]:
# Download the model.tar.gz containing the PyTorch model, to current dir
S3Downloader.download(pt_mnist_model_data, ".")
# unzip the tar.gz
!tar -xvf model.tar.gz
# after unzipping, remove the model.tar.gz
!rm model.tar.gz
# copy the pytorch inference script to current dir
!cp pytorch/code/inference.py .
# gzip the inference.py and model file together in a new model.tar.gz
!tar -czvf model.tar.gz model.pth inference.py
# remove the residual files
!rm inference.py model.pth
# upload the new tar.gz to s3
updated_pt_model_key = "multi-container-endpoint/output/pytorch/updated"
pt_updated_model_uri = S3Uploader.upload(
"model.tar.gz", "s3://{}/{}".format(bucket, updated_pt_model_key)
)
# remove the new model.tar.gz from the current dir
!rm model.tar.gz
Now, everything is ready to create a container definition for PyTorch container
[ ]:
pt_ecr_image_uri = sagemaker.image_uris.retrieve(
framework="pytorch",
region=region,
version="1.8.1",
py_version="py36",
instance_type="ml.c5.4xlarge",
image_scope="inference",
)
pytorch_container = {
"ContainerHostname": "pytorch-mnist",
"Image": pt_ecr_image_uri,
"ModelDataUrl": pt_updated_model_uri,
"Environment": {
"SAGEMAKER_PROGRAM": "inference.py",
"SAGEMAKER_SUBMIT_DIRECTORY": pt_updated_model_uri,
},
}
Create a SageMaker Model
In the below cell, call the create_model
API to create a model which contains the definitions of both the PyTorch and TensorFlow containers created above. It needs to supply both the containers under the Containers
argument. Also set the Mode
parameter of the InferenceExecutionConfig
field to Direct
for direct invocation of each container, or Serial
to use containers as an inference pipeline. The default mode is Serial
. For more details, check out Deploy
multi-container endpoints
Since this notebook focuses on the Direct invocation behavior, hence set the value as Direct
.
[ ]:
create_model_response = sm_client.create_model(
ModelName="mnist-multi-container",
Containers=[pytorch_container, tensorflow_container],
InferenceExecutionConfig={"Mode": "Direct"},
ExecutionRoleArn=role,
)
Create Endpoint Configuration
Now, create an endpoint configuration by calling the create_endpoint_config
API. Here, supply the same ModelName
used in the create_model
API call.
[ ]:
endpoint_config = sm_client.create_endpoint_config(
EndpointConfigName="mnist-multi-container-ep-config",
ProductionVariants=[
{
"VariantName": "prod",
"ModelName": "mnist-multi-container",
"InitialInstanceCount": 1,
"InstanceType": "ml.c5.4xlarge",
},
],
)
Create a SageMaker Multi-container endpoint
Now, the last step is to create a SageMaker multi-container endpoint. The create_endpoint
API is used for this. The API behavior has no change compared to how a single container/model endpoint is deployed.
[ ]:
endpoint = sm_client.create_endpoint(
EndpointName="mnist-multi-container-ep", EndpointConfigName="mnist-multi-container-ep-config"
)
The create_endpoint
API is synchronous in nature and returns an immediate response with the endpoint status being inCreating
state. It takes around ~8-10 minutes for multi-container endpoint to be InService
.
In the below cell, use the describe_endpoint
API to check the status of endpoint creation. It runs a simple waiter loop calling the describe_endpoint
API, for the endpoint to be InService
[ ]:
describe_endpoint = sm_client.describe_endpoint(EndpointName="mnist-multi-container-ep")
endpoint_status = describe_endpoint["EndpointStatus"]
while endpoint_status != "InService":
print("Current endpoint status is: {}, Trying again...".format(endpoint_status))
time.sleep(60)
resp = sm_client.describe_endpoint(EndpointName="mnist-multi-container-ep")
endpoint_status = resp["EndpointStatus"]
print("Endpoint status changed to 'InService'")
Section 5: Inference
Now that the endpoint is set up it is time to perform inference on the endpoint by specifying one of the container host name. First, download the MNIST
data and select a random sample of images.
Use the helper functions defined in code.utils
to download MNIST
data set and normalize the input data.
[ ]:
%matplotlib inline
data_dir = "/tmp/data"
X, _ = mnist_to_numpy(data_dir, train=False)
# randomly sample 16 images to inspect
mask = random.sample(range(X.shape[0]), 16)
samples = X[mask]
# plot the images
fig, axs = plt.subplots(nrows=1, ncols=16, figsize=(16, 1))
for i, splt in enumerate(axs):
splt.imshow(samples[i])
[ ]:
print(samples.shape, samples.dtype)
Invoking the TensorFlow container
Now invoke the TensorFlow container, on the same endpoint. First normalize the sample selected and then pass the sample to the invoke_endpoint
API.
[ ]:
tf_samples = normalize(samples, axis=(1, 2))
tf_result = runtime_sm_client.invoke_endpoint(
EndpointName="mnist-multi-container-ep",
ContentType="application/json",
Accept="application/json",
TargetContainerHostname="tensorflow-mnist",
Body=json.dumps({"instances": np.expand_dims(tf_samples, 3).tolist()}),
)
tf_body = tf_result["Body"].read().decode("utf-8")
tf_json_predictions = json.loads(tf_body)["predictions"]
# softmax to logit
tf_predictions = np.array(tf_json_predictions, dtype=np.float32)
tf_predictions = np.argmax(tf_json_predictions, axis=1)
[ ]:
print("Predictions: ", tf_predictions.tolist())
Invoke PyTorch container
Now, invoke the PyTorch Container. In transform_fn
, of inference.py
it is declared that the parsed data is a python dictionary with a key inputs
and its value should be a 1D array of length 784. Hence, create a sample inference data in the cell below.
Before we invoke the SageMaker PyTorch model server with samples
, we need to do some pre-processing - convert its data type to 32 bit floating point - normalize each channel (only one channel for MNIST
) - add a channel dimension
[ ]:
pt_samples = normalize(samples.astype(np.float32), axis=(1, 2))
pt_result = runtime_sm_client.invoke_endpoint(
EndpointName="mnist-multi-container-ep",
ContentType="application/json",
Accept="application/json",
TargetContainerHostname="pytorch-mnist",
Body=json.dumps({"inputs": np.expand_dims(pt_samples, axis=1).tolist()}),
)
pt_body = pt_result["Body"].read().decode("utf-8")
pt_predictions = np.argmax(np.array(json.loads(pt_body), dtype=np.float32), axis=1).tolist()
print("Predicted digits: ", pt_predictions)
Section 6: clean up
Before leaving this exercise, it is a good practice to delete the resources created.
[ ]:
sm_client.delete_endpoint(EndpointName="mnist-multi-container-ep")
sm_client.delete_endpoint_config(EndpointConfigName="mnist-multi-container-ep-config")
sm_client.delete_model(ModelName="mnist-multi-container")
Notebook CI Test Results
This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.