Highly Performant TensorFlow Batch Inference on Image Data Using the SageMaker Python SDK

In this notebook, we’ll show how to use SageMaker batch transform to get inferences on a large datasets. To do this, we’ll use a TensorFlow Serving model to do batch inference on a large dataset of images. We’ll show how to use the new pre-processing and post-processing feature of the TensorFlow Serving container on Amazon SageMaker so that your TensorFlow model can make inferences directly on data in S3, and save post-processed inferences to S3.

The dataset we’ll be using is the “Challenge 2018/2019”” subset of the Open Images V5 Dataset. This subset consists of 100,00 images in .jpg format, for a total of 10GB. For demonstration, the model we’ll be using is an image classification model based on the ResNet-50 architecture that has been trained on the ImageNet dataset, and which has been exported as a TensorFlow SavedModel.

We will use this model to predict the class that each model belongs to. We’ll write a pre- and post-processing script and package the script with our TensorFlow SavedModel, and demonstrate how to get inferences on large datasets with SageMaker batch transform quickly, efficiently, and at scale, on GPU-accelerated instances.


We’ll begin with some necessary imports, and get an Amazon SageMaker session to help perform certain tasks, as well as an IAM role with the necessary permissions.

[ ]:
import numpy as np
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()
role = get_execution_role()

region = sagemaker_session.boto_region_name
bucket = sagemaker_session.default_bucket()
prefix = "sagemaker/DEMO-tf-batch-inference-jpeg-images-python-sdk"
print("Region: {}".format(region))
print("S3 URI: s3://{}/{}".format(bucket, prefix))
print("Role:   {}".format(role))

Inspecting the SavedModel

In order to make inferences, we’ll have to preprocess our image data in S3 to match the serving signature of our TensorFlow SavedModel (https://www.tensorflow.org/guide/saved_model), which we can inspect using the saved_model_cli (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/saved_model_cli.py). This is the serving signature of the ResNet-50 v2 (NCHW, JPEG) (https://github.com/tensorflow/models/tree/master/official/resnet#pre-trained-model) model:

[ ]:
!aws s3 cp s3://sagemaker-sample-data-{region}/batch-transform/open-images/model/resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz .
!tar -zxf resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz
!saved_model_cli show --dir resnet_v2_fp32_savedmodel_NCHW_jpg/1538687370/ --all

The SageMaker TensorFlow Serving Container uses the model’s SignatureDef named serving_default , which is declared when the TensorFlow SavedModel is exported. This SignatureDef says that the model accepts a string of arbitrary length as input, and responds with classes and their probabilities. With our image classification model, the input string will be a base-64 encoded string representing a JPEG image, which our SavedModel will decode.

Writing a pre- and post-processing script

We will package up our SavedModel with a Python script named inference.py, which will pre-process input data going from S3 to our TensorFlow Serving model, and post-process output data before it is saved back to S3:

[ ]:
!pygmentize code/inference.py

The input_handler intercepts inference requests, base-64 encodes the request body, and formats the request body to conform to TensorFlow Serving’s REST API (https://www.tensorflow.org/tfx/serving/api_rest). The return value of the input_handler function is used as the request body in the TensorFlow Serving request.

Binary data must use key “b64”, according to the TFS REST API (https://www.tensorflow.org/tfx/serving/api_rest#encoding_binary_values), and since our serving signature’s input tensor has the suffix “_bytes”, the encoded image data under key “b64” will be passed to the “image_bytes” tensor. Some serving signatures may accept a tensor of floats or integers instead of a base-64 encoded string, but for binary data (including image data), it is recommended that your SavedModel accept a base-64 encoded string for binary data, since JSON representations of binary data can be large.

Each incoming request originally contains a serialized JPEG image in its request body, and after passing through the input_handler, the request body contains the following, which our TensorFlow Serving accepts for inference:

{"instances": [{"b64":"[base-64 encoded JPEG image]"}]}

The first field in the return value of output_handler is what SageMaker Batch Transform will save to S3 as this example’s prediction. In this case, our output_handler passes the content on to S3 unmodified.

Pre- and post-processing functions let you perform inference with TensorFlow Serving on any data format, not just images. To learn more about the input_handler and output_handler, consult the SageMaker TensorFlow Serving Container README (https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/README.md).

Packaging a Model

After writing a pre- and post-processing script, you’ll need to package your TensorFlow SavedModel along with your script into a model.tar.gz file, which we’ll upload to S3 for the SageMaker TensorFlow Serving Container to use. Let’s package the SavedModel with the inference.py script and examine the expected format of the model.tar.gz file:

[ ]:
!tar -cvzf model.tar.gz code --directory=resnet_v2_fp32_savedmodel_NCHW_jpg 1538687370

1538687370 refers to the model version number of the SavedModel, and this directory contains our SavedModel artifacts. The code directory contains our pre- and post-processing script, which must be named inference.py. I can also include an optional requirements.txt file, which is used to install dependencies with pip from the Python Package Index before the Transform Job starts, but we don’t need any additional dependencies in this case, so we don’t include a requirements file.

We will use this model.tar.gz when we create a SageMaker Model, which we will use to run Transform Jobs. To learn more about packaging a model, you can consult the SageMaker TensorFlow Serving Container README.

Run a Batch Transform job

Next, we’ll run a Batch Transform job using our data processing script and GPU-based Amazon SageMaker Model. More specifically, we’ll perform inference on a cluster of two instances, though we can choose more or fewer. The objects in the S3 path will be distributed across the instances.

The code below creates a SageMaker Model entity that will be used for Batch inference, and runs a Transform Job using that Model. The Model contains a reference to the TFS container, and the model.tar.gz containing our TensorFlow SavedModel and the pre- and post-processing inference.py script.

[ ]:
import os
import sagemaker
from sagemaker.tensorflow.serving import Model

s3_path = "s3://{}/{}".format(bucket, prefix)

model_data = sagemaker_session.upload_data("model.tar.gz", bucket, os.path.join(prefix, "model"))

tensorflow_serving_model = Model(
    model_data=model_data, role=role, framework_version="1.13", sagemaker_session=sagemaker_session

input_path = "s3://sagemaker-sample-data-{}/batch-transform/open-images/jpg".format(region)

print("Model data S3 path: {}".format(model_data))
print("Input S3 path: {}".format(input_path))

Before we create a Transform Job, let’s inspect some of our input data. Here’s an example, the first image in our dataset:


The data in the input path consists of 100,000 JPEG images of varying sizes and shapes. Here is a subset:

[ ]:
!echo "Transform input path: {input_path}"
!aws s3 ls {input_path}/000 --human-readable

Now that we’ve created a SageMaker Model, we can use it to run batch predictions using Batch Transform. We specify the input S3 data, content type of the input data, the output S3 data, and instance type and count.

For improved performance, we specify two additional parameters max_concurrent_transforms and max_payload, which control the maximum number of parallel requests that can be sent to each instance in a transform job at a time, and the maximum size of each request body.

When performing inference on entire S3 objects that cannot be split by newline characters, such as images, it is recommended that you set max_payload to be slightly larger than the largest S3 object in your dataset, and that you experiment with the max_concurrent_transforms parameter in powers of two to find a value that maximizes throughput for your model. For example, we’ve set max_concurrent_transforms to 64 after experimenting with powers of two, and we set max_payload to 1, since the largest object in our S3 input is less than one megabyte.

[ ]:
output_path = os.path.join(s3_path, "output")
tensorflow_serving_transformer = tensorflow_serving_model.transformer(

print("Transform input S3 path:  {}".format(input_path))
print("Transform output S3 path: {}".format(output_path))
tensorflow_serving_transformer.transform(input_path, content_type="application/x-image")

After our transform job finishes, we find one S3 object in the output path for each object in the input path. This object contains the inferences from our model for that object, and has the same name as the corresponding input object, but with .out appended to it.

[ ]:
!aws s3 ls {output_path}/000 --human-readable

Inspecting one of the output objects, we find the prediction from our TensorFlow Serving model. This is from the example image displayed above:

[ ]:
!aws s3 cp {output_path}/00000b4dcff7f799.jpg.out .
!cat 00000b4dcff7f799.jpg.out
[ ]:
import json

with open("00000b4dcff7f799.jpg.out", "r") as f:
    jstr = json.load(f)

    # Subtracting 1 for "background" class
    class_index = jstr["predictions"][0]["classes"] - 1
    probabilities = jstr["predictions"][0]["probabilities"]
    import numpy as np

    probs = np.argmax(probabilities)
    print(probabilities[class_index + 1])

    # Index 864 corresponds to "tow truck"
    print("Class index: {}".format(class_index))


SageMaker batch transform can transform large datasets quickly and scalably. We used the SageMaker TensorFlow Serving Container to demonstrate how to quickly get inferences on a hundred thousand images using GPU-accelerated instances.

The Amazon SageMaker TFS container supports CSV and JSON data out of the box. The pre- and post-processing feature of the container lets you run transform jobs on data of any format. The same container can be used for real-time inference as well using an Amazon SageMaker hosted model endpoint.