AutoGluon Tabular with Deep Learning Containers on SageMaker


This notebook’s CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

This us-west-2 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable


AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy deep learning models on tabular, image, and text data. This example shows how to use AutoGluon-Tabular with Amazon SageMaker by applying pre-built deep learning containers.

Prerequisites

[ ]:
# Ensure autogluon the most recent images information is available in SageMaker Python SDK
!pip install -q -U 'sagemaker>=2.126.0'
[ ]:
import sagemaker
import pandas as pd
from ag_model import (
    AutoGluonSagemakerEstimator,
    AutoGluonNonRepackInferenceModel,
    AutoGluonSagemakerInferenceModel,
    AutoGluonRealtimePredictor,
    AutoGluonBatchPredictor,
)
from sagemaker import utils
from sagemaker.serializers import CSVSerializer
import os
import boto3

role = sagemaker.get_execution_role()
sagemaker_session = sagemaker.session.Session()
region = sagemaker_session._region_name

bucket = sagemaker_session.default_bucket()
s3_prefix = f"autogluon_sm/{utils.sagemaker_timestamp()}"
output_path = f"s3://{bucket}/{s3_prefix}/output/"

Get the data

We’ll be using the Adult Census dataset for this exercise. This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics), with the task being to predict if an individual person makes over 50K a year.

[ ]:
!mkdir -p data
[ ]:
columns = [
    "age",
    "workclass",
    "fnlwgt",
    "education",
    "education-num",
    "marital-status",
    "occupation",
    "relationship",
    "race",
    "sex",
    "capital-gain",
    "capital-loss",
    "hours-per-week",
    "native-country",
    "class",
]
[ ]:
# Download the data - needed for examples; in notebooks, S3 URL can be directly used for loading from S3
s3 = boto3.client("s3")
s3.download_file(
    f"sagemaker-example-files-prod-{region}",
    "datasets/tabular/uci_adult/adult.data",
    "data/adult.data",
)
s3.download_file(
    f"sagemaker-example-files-prod-{region}",
    "datasets/tabular/uci_adult/adult.test",
    "data/adult.test",
)
[ ]:
df_train = pd.read_csv("data/adult.data", header=None, names=columns)
df_train.to_csv("data/train.csv")
[ ]:
df_test = pd.read_csv("data/adult.test", header=None, skiprows=1, names=columns)
df_test["class"] = df_test["class"].map(
    {
        " <=50K.": " <=50K",
        " >50K.": " >50K",
    }
)
df_test.to_csv("data/test.csv")

Training

Users can create their own training/inference scripts using SageMaker Python SDK examples. The scripts we created allow to pass AutoGluon configuration as a YAML file (located in data/config directory).

We are using official AutoGluon Deep Learning Container images with custom training scripts (see scripts/ directory).

[ ]:
ag = AutoGluonSagemakerEstimator(
    role=role,
    entry_point="scripts/tabular_train.py",
    region=region,
    instance_count=1,
    instance_type="ml.m5.2xlarge",
    framework_version="0.6",
    py_version="py38",
    base_job_name="autogluon-tabular-train",
    disable_profiler=True,
    debugger_hook_config=False,
)

Upload the data to s3

[ ]:
s3_prefix = f"autogluon_sm/{utils.sagemaker_timestamp()}"
train_input = ag.sagemaker_session.upload_data(
    path=os.path.join("data", "train.csv"), key_prefix=s3_prefix
)
eval_input = ag.sagemaker_session.upload_data(
    path=os.path.join("data", "test.csv"), key_prefix=s3_prefix
)
config_input = ag.sagemaker_session.upload_data(
    path=os.path.join("config", "config-med.yaml"), key_prefix=s3_prefix
)

# Provide inference script so the script repacking is not needed later
# See more here: https://docs.aws.amazon.com/sagemaker/latest/dg/mlopsfaq.html
# Q. Why do I see a repack step in my SageMaker pipeline?
inference_script = ag.sagemaker_session.upload_data(
    path=os.path.join("scripts", "tabular_serve.py"), key_prefix=s3_prefix
)

Fit The Model

For local training set instance_type to local.

For non-local training the recommended instance type is ml.m5.2xlarge.

[ ]:
job_name = utils.unique_name_from_base("test-autogluon-image")
ag.fit(
    {
        "config": config_input,
        "train": train_input,
        "test": eval_input,
        "serving": inference_script,
    },
    job_name=job_name,
)

Model export

AutoGluon models are portable: everything needed to deploy a trained model is in the tarball created by SageMaker.

The artifact can be used locally, on EC2/ECS/EKS or served via SageMaker Inference.

[ ]:
!aws s3 cp {ag.model_data} .
[ ]:
!ls -alF model.tar.gz

Endpoint Deployment

Upload the model we trained earlier

[ ]:
endpoint_name = sagemaker.utils.unique_name_from_base("sagemaker-autogluon-serving-trained-model")

model_data = sagemaker_session.upload_data(
    path=os.path.join(".", "model.tar.gz"), key_prefix=f"{endpoint_name}/models"
)

Deploy remote or local endpoint

[ ]:
instance_type = "ml.m5.2xlarge"
# instance_type = 'local'
[ ]:
model = AutoGluonNonRepackInferenceModel(
    model_data=model_data,
    role=role,
    region=region,
    framework_version="0.6",
    py_version="py38",
    instance_type=instance_type,
    source_dir="scripts",
    entry_point="tabular_serve.py",
)
[ ]:
model.deploy(initial_instance_count=1, serializer=CSVSerializer(), instance_type=instance_type)
[ ]:
predictor = AutoGluonRealtimePredictor(model.endpoint_name)

Predict on unlabeled test data

Remove target variable (class) from the data and get predictions for a sample of 100 rows using the deployed endpoint.

[ ]:
df = pd.read_csv("data/test.csv")
data = df[:100]
[ ]:
preds = predictor.predict(data.drop(columns="class"))
preds
[ ]:
p = preds[["pred"]]
p = p.join(data["class"]).rename(columns={"class": "actual"})
p.head()
[ ]:
print(f"{(p.pred==p.actual).astype(int).sum()}/{len(p)} are correct")

Cleanup Endpoint

[ ]:
predictor.delete_endpoint()

Batch Transform

Deploying a trained model to a hosted endpoint has been available in SageMaker since launch and is a great way to provide real-time predictions to a service like a website or mobile app. But, if the goal is to generate predictions from a trained model on a large dataset where minimizing latency isn’t a concern, then the batch transform functionality may be easier, more scalable, and more appropriate.

Read more about Batch Transform.

[ ]:
endpoint_name = sagemaker.utils.unique_name_from_base(
    "sagemaker-autogluon-batch_transform-trained-model"
)

model_data = sagemaker_session.upload_data(
    path=os.path.join(".", "model.tar.gz"), key_prefix=f"{endpoint_name}/models"
)
[ ]:
instance_type = "ml.m5.2xlarge"
[ ]:
model = AutoGluonSagemakerInferenceModel(
    model_data=model_data,
    role=role,
    region=region,
    framework_version="0.6",
    py_version="py38",
    instance_type=instance_type,
    entry_point="tabular_serve-batch.py",
    source_dir="scripts",
    predictor_cls=AutoGluonBatchPredictor,
)
[ ]:
transformer = model.transformer(
    instance_count=1,
    instance_type=instance_type,
    strategy="MultiRecord",
    max_payload=6,
    max_concurrent_transforms=1,
    output_path=output_path,
    accept="application/json",
    assemble_with="Line",
)

Prepare data for batch transform

[ ]:
pd.read_csv(f"data/test.csv")[:100].to_csv("data/test_no_header.csv", header=False, index=False)

Upload data to sagemaker session

[ ]:
test_input = transformer.sagemaker_session.upload_data(
    path=os.path.join("data", "test_no_header.csv"), key_prefix=s3_prefix
)
[ ]:
transformer.transform(
    test_input,
    input_filter="$[:14]",  # filter-out target variable
    split_type="Line",
    content_type="text/csv",
    output_filter="$['class']",  # keep only prediction class in the output
)

transformer.wait()

Download batch transform outputs

[ ]:
!aws s3 cp {transformer.output_path[:-1]}/test_no_header.csv.out .
[ ]:
p = pd.concat(
    [
        pd.read_json("test_no_header.csv.out", orient="index")
        .sort_index()
        .rename(columns={0: "preds"}),
        pd.read_csv("data/test.csv")[["class"]].iloc[:100].rename(columns={"class": "actual"}),
    ],
    axis=1,
)
p.head()
[ ]:
print(f"{(p.preds==p.actual).astype(int).sum()}/{len(p)} are correct")

Conclusion

In this tutorial we successfully trained an AutoGluon model and explored a few options how to deploy it using SageMaker. Any of the sections of this tutorial (training/endpoint inference/batch inference) can be used independently (i.e. train locally, deploy to SageMaker, or vice versa).

Next steps: * Learn more about AutoGluon, explore tutorials. * Explore SageMaker inference documentation.

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

This us-east-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This us-east-2 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This us-west-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ca-central-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This sa-east-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This eu-west-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This eu-west-2 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This eu-west-3 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This eu-central-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This eu-north-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ap-southeast-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ap-southeast-2 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ap-northeast-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ap-northeast-2 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ap-south-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable