Using dataset product from AWS Data Exchange with ML model from AWS Marketplace

This sample notebook shows how to perform machine learning on third-party datasets from AWS Data Exchange using a pre-trained ML Model.

In this notebook, you will subscribe to a dataset listed by shutterstock in AWS Data Exchange. You will then export the dataset to an S3 bucket, and then download it to your local environment. You will also subscribe to Resnet 18, an open ML model from AWS Marketplace and deploy it in form an Amazon SageMaker Endpoint. Finally, you will perform inference.

Contents:

Usage instructions

You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

Pre-requisites:

Pre-requisite 1:

This sample notebook assumes a subscription to the 500 Image & Metadata Free Sample dataset has been created and data has been exported into an S3 bucket.

If you have not done this already, please follow these steps:

Subscribe to data from AWS Data Exchange:

  1. Open the 500 Image & Metadata Free Sample dataset from AWS Data Exchange console.

  2. Read the overview and other information such as pricing, usage, support.

  3. Choose Continue to Subscribe.

  4. If your organization agrees to subscription terms, pricing information, and Data subscription agreement, then review/update the renewal settings and choose Subscribe.

  5. Once subscription has been successfully created (This step may take 5-10 minutes), you will find the dataset listed in the **Subscriptions** section of the console

  6. From subscription page, open Shutterstock dataset, and for this use-case, choose the Data set: 500 Image & Metadata Free Sample dataset.

  7. Select the revision and then choose Export to Amazon S3.

  8. Select appropriate bucket and once the export job has completed, open the s3 bucket you chose in preceding step and then copy the S3 URL of the data folder by choosing Copy S3 URI and specify the same in following cell.

[ ]:
# Please specify S3 location in which dataset has been exported.
dataset_export_location = ""
# dataset_export_location='s3://bucket/adx_free_data_sample/'

Pre-requisite 2:

This sample notebook assumes a subscription to the Resnet 18 ML Model has been created and an endpoint has been deployed.If you have not done this already, please follow these steps:

Subscribe and deploy ML Model from AWS Marketplace:

  1. Open the Resnet 18 ML Model from AWS Marketplace listing from AWS Marketplace.

  2. Read the Highlights section and then product overview section of the listing.

  3. View usage information and then additional resources.

  4. Note the supported instance types.

  5. Next, click on Continue to subscribe.

  6. Review End user license agreement, support terms, as well as pricing information.

  7. “Accept Offer” button needs to be clicked if your organization agrees with EULA, pricing information as well as support terms.

  8. Choose Continue to Configuration.

  9. Leave AWS CloudFormation as the selected option and if this is the first time you are using Amazon SageMaker, under Configure for AWS CloudFormation, choose Create and use a new service role and *Any S3 bucket, and then select Launch CloudFormation Template.

  10. In CloudFormation console, choose Create Stack

  11. After you have launched AWS CloudFormation template, wait for the newly launched AWS CloudFormation stack’s status to change to Create Complete.

  12. Open Outputs tab of the CloudFormation stack and then copy the value corresponding to EndpointName and specify the same in following cell.

[ ]:
endpoint_name = "Endpoint-ResNet-18-1"
[ ]:
# Import necessary libraries.
import math
import re
import os
import json
import time

import glob
import pandas as pd

import boto3
import sagemaker
from sagemaker import AlgorithmEstimator
from sagemaker import get_execution_role
from IPython.display import Image

s3 = boto3.client("s3")
sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
runtime = boto3.client("runtime.sagemaker")

content_type = "application/x-image"
predictions = []

s3_bucket = f"jumpstart-cache-prod-{region}"
s3.download_file(s3_bucket, "inference-notebook-assets/ImageNetLabels.txt", "ImageNetLabels.txt")
with open("ImageNetLabels.txt", "r") as file:
    class_id_to_label = file.read().splitlines()[1::]

Introduction

You work for a super cool startup, which lets you bring your pet to the office. The startup is expanding and culture is pretty friendly. Your office is on a large campus provided by a tech incubator. The campus itself is well-equipped with safety cameras.

You bring your little shih-tzu dog, affectionately called Toffee, to work. Because of his friendly nature, he quickly becomes the most popular dog on entire campus. He loves visiting all his friends and you have to find Toffee every day before leaving work.

Since the campus is large, it is hard to physically go everywhere and find your dog. You typically end up with security and have to go through hundreds of cameras to find Toffee before you can leave for the day.

In this workshop, you will develop new skills which you can use to build a software that security team can use to help people find their dog. For this workshop, you don’t need to worry about finding a campus and setting up cameras. Shutterstock has provided a dataset that you will use for the analysis. As part of pre-requisites of this notebook, you should already have subscribed to the dataset and specified the s3 location in dataset_export_location variable.

Explore dataset

Next, you will load the dataset from S3 into your local execution environment.

[ ]:
!aws s3 sync $dataset_export_location data

Load the camera footage into a dictionary so you can easily do a lookup.

[ ]:
camera_footage = {}
counter = 1
for subdir, dirs, files in os.walk("data"):
    for file in files:
        camera_footage[counter] = subdir + "/" + file
        counter = counter + 1

print("Total ", (counter - 1), " cameras were found")


def get_camera_id(value):
    for (
        key,
        val,
    ) in camera_footage.items():  # for name, age in dictionary.iteritems():  (for Python 2.x)
        if value in val:
            return key

See what footage from camera #1 looks like

[ ]:
def show_cam_footage(camera_id):
    return Image(url=camera_footage[camera_id], width=400, height=800)


camera_id = get_camera_id("1634351818.jpg")
show_cam_footage(camera_id)

Looks like you are looking at camera located in the grocery store of the campus. Try footage from another camera.

[ ]:
camera_id = get_camera_id("1821728006.jpg")
show_cam_footage(camera_id)

That’s Stacy from your team giving a treat to her golden retriever!

Now you need to identify a way to catalog all the different dogs and cats so that you can look them up easily. For this purpose, you will use an ML model that can identify 1000 different image classes including many popular dog and cat breeds as shown in following table.

Class

dog

redbone

dog

shih-tzu

dog

collie

dog

basset

dog

malamute

dog

beagle

dog

pug

dog

golden retriever

dog

tabby

cat

siamese cat

cat

Perform inference

As part of pre-requisite#2, you have already deployed the ML model and configured the endpoint name in ‘endpoint_name’ variable. Now you are ready to perform inference.

[ ]:
# The following method sends picture corresponding to camera_id specified to the ML model
# and returns you the classes found.


def perform_inference(camera_id):

    with open(camera_footage[camera_id], "rb") as file:
        body = file.read()

        # Perform inference by calling invoke_endpoint API
        response = runtime.invoke_endpoint(
            EndpointName=endpoint_name, ContentType=content_type, Body=body
        )

        # Parse the inference response and load top 10 classes found into a dictionary.
        prediction = json.loads(response["Body"].read())
        prediction_ids = sorted(
            range(len(prediction)), key=lambda index: prediction[index], reverse=True
        )[:10]
        for id in prediction_ids:
            predictions.append([camera_id, class_id_to_label[id].lower(), 100 * prediction[id]])
[ ]:
# Perform inference on all cameras
for id in camera_footage:
    perform_inference(id)

# Load the inference results into a pandas datafram so you can easily look it up.
df = pd.DataFrame(predictions, columns=["camera_id", "entity", "probability_measure"])

Now that our catalog containing image classes for all cameras is ready, you can look-up the classes identified by the Resnet-18 machine learning model.

[ ]:
print("-------------------------------------------------")
print("Image classes summary for cam-", camera_id)
print("-------------------------------------------------")
print(df[df["camera_id"] == camera_id])
show_cam_footage(camera_id)

You can see how the ML model was able to identify the golden retriever with high probability measure value.

[ ]:
# Following function accepts the pet catagory and returns results
# that meet the probability_measure threshold.
def find_my_pet(catagory, probability_measure):
    images = []
    entries = df[
        (df["entity"] == catagory) & (df["probability_measure"] > probability_measure)
    ].sort_values("probability_measure", ascending=False)
    for entry in entries.iterrows():
        print(
            "Camera-id:"
            + str(entry[1]["camera_id"])
            + "   ->   "
            + str(entry[1]["probability_measure"])
        )
        display(Image(url=camera_footage[entry[1]["camera_id"]], width=400, height=800))

Now its time to find Toffee. Specify a pet_category and a probability_measure value to see all cameras that have the specified pet.

[ ]:
pet_category = "shih-tzu"
probability_measure_threshold = 10
find_my_pet(pet_category, probability_measure_threshold)

You can now try specifying different values for the pet_category and probability_measure variables to see how model behaves.

Congratulations, you have learnt how pre-trained ML models can be used to extract insights from data.

Next Steps

As a next step, i recommend you to: 1. Explore AWS Data Exchange and identify the dataset that will help you solve your business problems. If you can’t find a dataset you are looking for, you can also request dataset products 2. Explore ML Models from AWS Marketplace and identify which ML model can help you build differentiating features. If you have any questions or need a custom ML model, you can contact AWS Marketplace team on aws-mp-bd-ml@amazon.com.

Cleanup

To avoid charges to your AWS account when not running your invocation, you will need to delete your endpoint. You will not be charged for keeping your endpoint config or model.

You can visit CloudFormation to delete the stack you created.

Finally, if the AWS Marketplace subscription was created just for the experiment and you want to unsubscribe to the product, here are the steps that can be followed. Before you cancel the subscription, ensure that you do not have any deployable model created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model.

Steps to un-subscribe to product from AWS Marketplace: 1. Navigate to Machine Learning tab on **Your Software subscriptions page** 2. Locate the listing that you need to cancel subscription for, and then Cancel Subscription can be clicked to cancel the subscription.

[ ]: