Using dataset product from AWS Data Exchange with ML model from AWS Marketplace

This notebook’s CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

This sample notebook shows how to perform machine learning on third-party datasets from AWS Data Exchange using a pre-trained ML Model.

In this notebook, you will subscribe to a dataset listed by shutterstock in AWS Data Exchange. You will then export the dataset to an S3 bucket, and then download it to your local environment. You will also subscribe to Resnet 18, an open ML model from AWS Marketplace and deploy it in form an Amazon SageMaker Endpoint. Finally, you will perform inference.

Usage instructions

You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

Pre-requisites:

Pre-requisite 1:

This sample notebook assumes a subscription to the 500 Image & Metadata Free Sample dataset has been created and data has been exported into an S3 bucket.

If you have not done this already, please follow these steps:

Pre-requisite 2:

This sample notebook assumes a subscription to the Resnet 18 ML Model has been created and an endpoint has been deployed.If you have not done this already, please follow these steps:

Introduction

You work for a super cool startup, which lets you bring your pet to the office. The startup is expanding and culture is pretty friendly. Your office is on a large campus provided by a tech incubator. The campus itself is well-equipped with safety cameras.

You bring your little shih-tzu dog, affectionately called Toffee, to work. Because of his friendly nature, he quickly becomes the most popular dog on entire campus. He loves visiting all his friends and you have to find Toffee every day before leaving work.

Since the campus is large, it is hard to physically go everywhere and find your dog. You typically end up with security and have to go through hundreds of cameras to find Toffee before you can leave for the day.

In this workshop, you will develop new skills which you can use to build a software that security team can use to help people find their dog. For this workshop, you don’t need to worry about finding a campus and setting up cameras. Shutterstock has provided a dataset that you will use for the analysis. As part of pre-requisites of this notebook, you should already have subscribed to the dataset and specified the s3 location in dataset_export_location variable.

Explore dataset

Next, you will load the dataset from S3 into your local execution environment.

[ ]:

!aws s3 sync $dataset_export_location data

Load the camera footage into a dictionary so you can easily do a lookup.

[ ]:

camera_footage = {}
counter = 1
for subdir, dirs, files in os.walk("data"):
    for file in files:
        camera_footage[counter] = subdir + "/" + file
        counter = counter + 1

print("Total ", (counter - 1), " cameras were found")


def get_camera_id(value):
    for (
        key,
        val,
    ) in camera_footage.items():  # for name, age in dictionary.iteritems():  (for Python 2.x)
        if value in val:
            return key

See what footage from camera #1 looks like

[ ]:

def show_cam_footage(camera_id):
    return Image(url=camera_footage[camera_id], width=400, height=800)


camera_id = get_camera_id("1634351818.jpg")
show_cam_footage(camera_id)

Looks like you are looking at camera located in the grocery store of the campus. Try footage from another camera.

[ ]:

camera_id = get_camera_id("1821728006.jpg")
show_cam_footage(camera_id)

That’s Stacy from your team giving a treat to her golden retriever!

Now you need to identify a way to catalog all the different dogs and cats so that you can look them up easily. For this purpose, you will use an ML model that can identify 1000 different image classes including many popular dog and cat breeds as shown in following table.

Class	dog
redbone	dog
shih-tzu	dog
collie	dog
basset	dog
malamute	dog
beagle	dog
pug	dog
golden retriever	dog
tabby	cat
siamese cat	cat

Perform inference

As part of pre-requisite#2, you have already deployed the ML model and configured the endpoint name in ‘endpoint_name’ variable. Now you are ready to perform inference.

[ ]:

# The following method sends picture corresponding to camera_id specified to the ML model
# and returns you the classes found.


def perform_inference(camera_id):

    with open(camera_footage[camera_id], "rb") as file:
        body = file.read()

        # Perform inference by calling invoke_endpoint API
        response = runtime.invoke_endpoint(
            EndpointName=endpoint_name, ContentType=content_type, Body=body
        )

        # Parse the inference response and load top 10 classes found into a dictionary.
        prediction = json.loads(response["Body"].read())
        prediction_ids = sorted(
            range(len(prediction)), key=lambda index: prediction[index], reverse=True
        )[:10]
        for id in prediction_ids:
            predictions.append([camera_id, class_id_to_label[id].lower(), 100 * prediction[id]])

[ ]:

# Perform inference on all cameras
for id in camera_footage:
    perform_inference(id)

# Load the inference results into a pandas datafram so you can easily look it up.
df = pd.DataFrame(predictions, columns=["camera_id", "entity", "probability_measure"])

Now that our catalog containing image classes for all cameras is ready, you can look-up the classes identified by the Resnet-18 machine learning model.

[ ]:

print("-------------------------------------------------")
print("Image classes summary for cam-", camera_id)
print("-------------------------------------------------")
print(df[df["camera_id"] == camera_id])
show_cam_footage(camera_id)

You can see how the ML model was able to identify the golden retriever with high probability measure value.

[ ]:

# Following function accepts the pet catagory and returns results
# that meet the probability_measure threshold.
def find_my_pet(catagory, probability_measure):
    images = []
    entries = df[
        (df["entity"] == catagory) & (df["probability_measure"] > probability_measure)
    ].sort_values("probability_measure", ascending=False)
    for entry in entries.iterrows():
        print(
            "Camera-id:"
            + str(entry[1]["camera_id"])
            + "   ->   "
            + str(entry[1]["probability_measure"])
        )
        display(Image(url=camera_footage[entry[1]["camera_id"]], width=400, height=800))

Now its time to find Toffee. Specify a pet_category and a probability_measure value to see all cameras that have the specified pet.

[ ]:

pet_category = "shih-tzu"
probability_measure_threshold = 10
find_my_pet(pet_category, probability_measure_threshold)

You can now try specifying different values for the pet_category and probability_measure variables to see how model behaves.

Congratulations, you have learnt how pre-trained ML models can be used to extract insights from data.

Next Steps

As a next step, i recommend you to: 1. Explore AWS Data Exchange and identify the dataset that will help you solve your business problems. If you can’t find a dataset you are looking for, you can also request dataset products 2. Explore ML Models from AWS Marketplace and identify which ML model can help you build differentiating features. If you have any questions or need a custom ML model, you can contact AWS Marketplace team on aws-mp-bd-ml@amazon.com.

Cleanup

To avoid charges to your AWS account when not running your invocation, you will need to delete your endpoint. You will not be charged for keeping your endpoint config or model.

You can visit CloudFormation to delete the stack you created.

Finally, if the AWS Marketplace subscription was created just for the experiment and you want to unsubscribe to the product, here are the steps that can be followed. Before you cancel the subscription, ensure that you do not have any deployable model created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model.

Steps to un-subscribe to product from AWS Marketplace: 1. Navigate to Machine Learning tab on Your Software subscriptions page 2. Locate the listing that you need to cancel subscription for, and then Cancel Subscription can be clicked to cancel the subscription.

Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.