Training Graph Convolutional Matrix Completion by using the Deep Graph Library with MXNet backend on Amazon SageMaker

This notebook is tested on MXNet 1.8 Python 3.7 CPU Optimized kernel.

The Amazon SageMaker Python SDK makes it easy to train Deep Graph Library (DGL) models. In this example, you train Graph Convolutional Matrix Completion network using the DMLC DGL API and the MovieLens dataset. Three datasets are supported: * MovieLens 100K Dataset, MovieLens 100K movie ratings. Stable benchmark dataset. 100,000 ratings from 1,000 users on 1,700 movies. * MovieLens 1M Dataset, MovieLens 1M movie ratings. Stable benchmark dataset. 1 million ratings from 6,000 users on 4,000 movies. * MovieLens 10M Dataset, MovieLens 10M movie ratings. Stable benchmark dataset. 10 million ratings and 100,000 tag applications applied to 10,000 movies by 72,000 users.

Prerequisites

To get started, install necessary packages.

[ ]:
!conda install -y boto3
!conda install -c anaconda -y botocore
[ ]:
import sagemaker
from sagemaker import get_execution_role
from sagemaker.session import Session

# Setup session
sess = sagemaker.Session()

# S3 bucket for saving code and model artifacts.
# Feel free to specify a different bucket here.
bucket = sess.default_bucket()

# Location to put your custom code.
custom_code_upload_location = "customcode"

# Location where results of model training are saved.
model_artifacts_location = "s3://{}/artifacts".format(bucket)

# IAM role that gives Amazon SageMaker access to resources in your AWS account.
# You can use the Amazon SageMaker Python SDK to get the role from your notebook environment.
role = get_execution_role()

The training script

The train.py script provides all the code you need for training an Amazon SageMaker model.

[ ]:
!cat src/train.py

Amazon SageMaker’s estimator class

With the Amazon SageMaker Estimator, you can run a single machine in Amazon SageMaker, using CPU or GPU-based instances.

When you create the estimator, pass-in the file name of the training script and the name of the IAM execution role. You can also use a few other parameters. train_instance_count and train_instance_type determine the number and type of Amazon SageMaker instances that will be used for the training job. The hyperparameters parameter is a dictionary of values that is passed to your training script as parameters so that you can use argparse to parse them. You can see how to access these values in the train.py script above.

In this example, you upload the whole code base (including train.py) into an Amazon SageMaker container and run the GCMC training using the MovieLens dataset.

You can also add a task_tag with value ‘DGL’ to help tracking the task.

[ ]:
from sagemaker.mxnet.estimator import MXNet

CODE_PATH = "src"
CODE_ENTRY = "train.py"
# code_location = sess.upload_data(CODE_PATH, bucket=bucket, key_prefix=custom_code_upload_location)

account = sess.boto_session.client("sts").get_caller_identity()["Account"]
region = sess.boto_session.region_name
image = sagemaker.image_uris.retrieve(
    "mxnet",
    sess.boto_region_name,
    version="1.6.0",
    py_version="py3",
    instance_type="ml.p3.2xlarge",
    image_scope="training",
)
print(image)

params = {}
params["data_name"] = "ml-1m"
# set output to SageMaker ML output
params["save_dir"] = "/opt/ml/model/"
task_tags = [{"Key": "ML Task", "Value": "DGL"}]

estimator = MXNet(
    entry_point=CODE_ENTRY,
    source_dir=CODE_PATH,
    role=role,
    train_instance_count=1,
    train_instance_type="ml.p3.2xlarge",
    image_uri=image,
    hyperparameters=params,
    tags=task_tags,
    sagemaker_session=sess,
)

Running the Training Job

After you construct the Estimator object, fit it using Amazon SageMaker. The dataset is automatically downloaded.

[ ]:
estimator.fit()

Output

You can get the model training output from the Amazon Sagemaker console by searching for the training task and looking for the address of ‘S3 model artifact’