Title

The title should be similar to the filename, but the filename should be very concise and compact, so people can read what it is when displayed in a list view in JupyterLab.

Example title - Amazon SageMaker Processing: pre-processing images with PyTorch using a GPU instance type

  • Bad example filename: amazon_sagemaker-processing-images_with_pytorch_on_GPU.ipynb (too long & mixes case, dashes, and underscores)

  • Good example filename: processing_images_pytorch_gpu.ipynb (succinct, all lowercase, all underscores)

IMPORTANT: Use only one maining heading with #, so your next subheading is ## or ### and so on.

Overview

  1. What does this notebook do?

    • What will the user learn how to do?

  2. Is this an end-to-end tutorial or it is a how-to (procedural) example?

    • Tutorial: add conceptual information, flowcharts, images

    • How to: notebook should be lean. More of a list of steps. No conceptual info, but links to resources for more info.

  3. Who is the audience?

    • What should the user be familiar with before running this?

    • Link to other examples they should have run first.

  4. How much will this cost?

    • Some estimate of both time and money is recommended.

    • List the instance types and other resources that are created.

Prerequisites

  1. Which environments does this notebook work in? Select all that apply.

  • Notebook Instances: Jupyter?

  • Notebook Instances: JupyterLab?

  • Studio?

  1. Which conda kernel is required?

  2. Is there a previous notebook that is required?

Setup

Setup Dependencies

  1. Describe any pip or conda or apt installs or setup scripts that are needed.

  2. Use flags that facilitate automatic, end-to-end running without a user prompt, so that the notebook can run in CI without any updates or special configuration.

[ ]:
# SageMaker Python SDK version 2.x is required
import sagemaker
import sys

Setup Python Modules

  1. Import modules, set options, and activate extensions.

[ ]:
# imports
import sagemaker
import numpy as np
import pandas as pd

# options
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30

# extensions
if 'autoreload' not in get_ipython().extension_manager.loaded:
    %load_ext autoreload

%autoreload 2

Parameters

  1. Setup user supplied parameters like custom bucket names and roles in a separated cell and call out what their options are.

  2. Use defaults, so the notebook will still run end-to-end without any user modification.

For example, the following description & code block prompts the user to select the preferred dataset.

To select a particular dataset, assign chosen_data_set below to be 'diabetes' or 'california', where each name corresponds to its respective dataset.

'california' : california housing data
'diabetes' : diabetes data
[ ]:
data_sets = {
    "diabetes": "load_diabetes()",
    "california": "fetch_california_housing()",
}

# Change chosen_data_set variable to one of the data sets above.
chosen_data_set = "california"
assert chosen_data_set in data_sets.keys()
print("I selected the '{}' dataset!".format(chosen_data_set))

Data import

  1. Look for the data that was stored by a previous notebook run %store -r variableName

  2. If that doesn’t exist, look in S3 in their default bucket

  3. If that doesn’t exist, download it from the SageMaker dataset bucket

  4. If that doesn’t exist, download it from origin

For example, the following code block will pull training and validation data that was created in a previous notebook. This allows the customer to experiment with features, re-run the notebook, and not have it pull the dataset over and over.

[ ]:
# Load relevant dataframes and variables from preprocessing_tabular_data.ipynb required for this notebook
%store -r X_train
%store -r X_test
%store -r X_val
%store -r Y_train
%store -r Y_test
%store -r Y_val
%store -r chosen_data_set

Procedure or tutorial

  1. Break up processes with Markdown blocks to explain what’s going on.

  2. Make use of visualizations to better demonstrate each step.

Cleanup

  1. Delete any endpoints or other resources that linger and might cost the user money.

Next steps

  1. Wrap up with some conclusion or overview of what was accomplished.

  2. Offer another notebook or more resources or some other call to action.

References

  1. author1, article1, journal1, year1, url1

  2. author2, article2, journal2, year2, url2

[ ]: