Using SageMaker Neo to Compile a Tensorflow U-Net Model

SageMaker Neo makes it easy to compile pre-trained TensorFlow models and build an inference optimized container without the need for any custom model serving or inference code.


U-Net is an architecture for semantic segmentation. It’s a popular model for biological images including Ultrasound, Microscopy, CT, MRI and more.

In this example, we will show how deploy a pre-trained U-Net model to a SageMaker Endpoint with Neo compilation using the SageMaker Python SDK, and then use the models to perform inference requests. We also provide a performance comparison so you can see the benefits of model compilation.


First, we need to ensure we have SageMaker Python SDK 1.x and Tensorflow 1.15.x. Then, import necessary Python packages.

[ ]:
!pip install -U --quiet --upgrade "sagemaker"
!pip install -U --quiet "tensorflow==1.15.3"
[ ]:
import tarfile
import numpy as np
import sagemaker
import time
from sagemaker.utils import name_from_base

Next, we’ll get the IAM execution role and a few other SageMaker specific variables from our notebook environment, so that SageMaker can access resources in your AWS account later in the example.

[ ]:
from sagemaker import get_execution_role
from sagemaker.session import Session

role = get_execution_role()
sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

SageMaker Neo supports Tensorflow 1.15.x. Check your version of Tensorflow to prevent downstream framework errors.

[ ]:
import tensorflow as tf

print(tf.__version__)  # This notebook runs on TensorFlow 1.15.x or earlier

Download U-Net Model

The SageMaker Neo TensorFlow Serving Container works with any model stored in TensorFlow’s SavedModel format. This could be the output of your own training job or a model trained elsewhere. For this example, we will use a pre-trained version of the U-Net model based on this repo.

[ ]:
model_name = "unet_medical"
export_path = "export"
model_archive_name = "unet-medical.tar.gz"
model_archive_url = "{}".format(
[ ]:
!wget {model_archive_url}

The pre-trained model and its artifacts are saved in a compressed tar file (.tar.gz) so unzip first with:

[ ]:
!tar -xvzf unet-medical.tar.gz

After downloading the model, we can inspect it using TensorFlow’s saved_model_cli command. In the command output, you should see

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:


The command output should also show details of the model inputs and outputs.

[ ]:
import os

model_path = os.path.join(export_path, "Servo/1")
!saved_model_cli show --all --dir {model_path}

Next we need to create a model archive file containing the exported model.

Upload the model archive file to S3

We now have a suitable model archive ready in our notebook. We need to upload it to S3 before we can create a SageMaker Model that. We’ll use the SageMaker Python SDK to handle the upload.

[ ]:
model_data = Session().upload_data(path=model_archive_name, key_prefix="model")
print("model uploaded to: {}".format(model_data))

Create a SageMaker Model and Endpoint

Now that the model archive is in S3, we can create an unoptimized Model and deploy it to an Endpoint.

[ ]:
from sagemaker.tensorflow.serving import Model

instance_type = "ml.c4.xlarge"
framework = "TENSORFLOW"
framework_version = "1.15.3"
[ ]:
sm_model = Model(model_data=model_data, framework_version=framework_version, role=role)
uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type=instance_type)

Make predictions using the endpoint

The endpoint is now up and running, and ready to handle inference requests. The deploy call above returned a predictor object. The predict method of this object handles sending requests to the endpoint. It also automatically handles JSON serialization of our input arguments, and JSON deserialization of the prediction results.

We’ll use this sample image:


[ ]:
sample_img_fname = "cell-4.png"
sample_img_url = "{}".format(
[ ]:
!wget {sample_img_url}
[ ]:
# read the image file into a tensor (numpy array)
!pip install --quiet opencv-python
!apt-get update -q && apt-get install ffmpeg libsm6 libxext6  -y -q

import cv2

image = cv2.imread(sample_img_fname)
original_shape = image.shape
[ ]:
!pip install matplotlib
from matplotlib import pyplot as plt

plt.imshow(image, cmap="gray", interpolation="none")
[ ]:
image = np.resize(image, (256, 256, 3))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = np.asarray(image)
image = np.expand_dims(image, axis=0)
[ ]:
start_time = time.time()

# get a prediction from the endpoint
# the image input is automatically converted to a JSON request.
# the JSON response from the endpoint is returned as a python dict
result = uncompiled_predictor.predict(image)
print("Prediction took %.2f seconds" % (time.time() - start_time))
[ ]:
# show the predicted segmentation image

cutoff = 0.4
segmentation_img = np.squeeze(np.asarray(result["predictions"])) > cutoff
segmentation_img = segmentation_img.astype(np.uint8)
segmentation_img = np.resize(segmentation_img, (original_shape[0], original_shape[1]))
plt.imshow(segmentation_img, "gray")

Uncompiled Predictor Performance

[ ]:
shape_input = np.random.rand(1, 256, 256, 3)
uncompiled_results = []

for _ in range(100):
    start = time.time()
    uncompiled_results.append((time.time() - start) * 1000)

print("\nPredictions for un-compiled model: \n")
print("\nP95: " + str(np.percentile(uncompiled_results, 95)) + " ms\n")
print("P90: " + str(np.percentile(uncompiled_results, 90)) + " ms\n")
print("P50: " + str(np.percentile(uncompiled_results, 50)) + " ms\n")
print("Average: " + str(np.average(uncompiled_results)) + " ms\n")

Compile model using SageMaker Neo

[ ]:
# Replace the value of data_shape below and
# specify the name & shape of the expected inputs for your trained model in JSON
# Note that -1 is replaced with 1 for the batch size placeholder
data_shape = {"inputs": [1, 224, 224, 3]}

instance_family = "ml_c4"

compilation_job_name = name_from_base("medical-tf-Neo")
# output path for compiled model artifact
compiled_model_path = "s3://{}/{}/output".format(bucket, compilation_job_name)
[ ]:
optimized_estimator = sm_model.compile(

Create Optimized Endpoint

[ ]:
optimized_predictor = optimized_estimator.deploy(
    initial_instance_count=1, instance_type=instance_type
[ ]:
start_time = time.time()

# get a prediction from the endpoint
# the image input is automatically converted to a JSON request.
# the JSON response from the endpoint is returned as a python dict
result = optimized_predictor.predict(image)
print("Prediction took %.2f seconds" % (time.time() - start_time))

Compiled Predictor Performance

[ ]:
compiled_results = []
test_input = {"instances": np.asarray(shape_input).tolist()}
# Warmup inference.
# Inferencing 100 times.
for _ in range(100):
    start = time.time()
    compiled_results.append((time.time() - start) * 1000)

print("\nPredictions for compiled model: \n")
print("\nP95: " + str(np.percentile(compiled_results, 95)) + " ms\n")
print("P90: " + str(np.percentile(compiled_results, 90)) + " ms\n")
print("P50: " + str(np.percentile(compiled_results, 50)) + " ms\n")
print("Average: " + str(np.average(compiled_results)) + " ms\n")

Performance Comparison

Here we compare inference speed up provided by SageMaker Neo. P90 is 90th percentile latency. We add this because it represents the tail of the latency distribution (worst case). More information on latency percentiles here.

[ ]:
p90 = np.percentile(uncompiled_results, 90) / np.percentile(compiled_results, 90)
p50 = np.percentile(uncompiled_results, 50) / np.percentile(compiled_results, 50)
avg = np.average(uncompiled_results) / np.average(compiled_results)

print("P90 Speedup: %.2f" % p90)
print("P50 Speedup: %.2f" % p50)
print("Average Speedup: %.2f" % avg)

Additional Information

Cleaning up

To avoid incurring charges to your AWS account for the resources used in this tutorial, you need to delete the SageMaker Endpoint.

[ ]:
[ ]: