Amazon SageMaker Lineage


This notebook’s CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

This us-west-2 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable


Amazon SageMaker Lineage enables events that happen within SageMaker to be traced via a graph structure. The data simplifies generating reports, making comparisons, or discovering relationships between events. For example easily trace both how a model was generated and where the model was deployed.

The lineage graph is created automatically by SageMaker and you can directly create or modify your own graphs.

Key Concepts

  • Lineage Graph - A connected graph tracing your machine learning workflow end to end.

  • Artifacts - Represents a URI addressable object or data. Artifacts are typically inputs or outputs to Actions.

  • Actions - Represents an action taken such as a computation, transformation, or job.

  • Contexts - Provides a method to logically group other entities.

  • Associations - A directed edge in the lineage graph that links two entities.

  • Lineage Traversal - Starting from an arbitrary point trace the lineage graph to discover and analyze relationships between steps in your workflow.

  • Experiments - Experiment entites (Experiments, Trials, and Trial Components) are also part of the lineage graph and can be associated wtih Artifacts, Actions, or Contexts.

Notebook Overview

This notebook demonstrates how to: * Understand the basics of lineage entities. * Create and associate lineage entities to track your workflow. * Traverse the associations between lineage entities.

Prerequisites

Select the Python 3 (Data Science) kernel in SageMaker Studio.

[ ]:
import boto3
import sagemaker

region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
default_bucket = sagemaker_session.default_bucket()
[ ]:
from datetime import datetime
from sagemaker.lineage.context import Context
from sagemaker.lineage.action import Action
from sagemaker.lineage.association import Association
from sagemaker.lineage.artifact import Artifact

unique_id = str(int(datetime.now().replace(microsecond=0).timestamp()))

print(f"Unique id is {unique_id}")
[ ]:
# create an example context

# the name must be unique across all other contexts
context_name = f"machine-learning-workflow-{unique_id}"

ml_workflow_context = Context.create(
    context_name=context_name,
    context_type="MLWorkflow",
    source_uri=unique_id,
    # properties services as a method to store metdata on lineage entities in additional to Tags
    properties={"example": "true"},
)
[ ]:
# list all the contexts

contexts = Context.list(sort_by="CreationTime", sort_order="Descending")

for ctx in contexts:
    print(ctx.context_name)
[ ]:
# create an example action and associate it with the context

model_build_action = Action.create(
    action_name=f"model-build-step-{unique_id}",
    action_type="ModelBuild",
    source_uri=unique_id,
    properties={"Example": "Metadata"},
)
[ ]:
# Association Type can be Produced|DerivedFrom|AssociatedWith|ContributedTo
context_action_association = Association.create(
    source_arn=ml_workflow_context.context_arn,
    destination_arn=model_build_action.action_arn,
    association_type="AssociatedWith",
)
[ ]:
# now the Action and Context are associated:
incoming_associations_to_action = Association.list(destination_arn=model_build_action.action_arn)
for association in incoming_associations_to_action:
    print(
        f"{model_build_action.action_name} has an incoming association from {association.source_name}"
    )

outgoing_associations_from_context = Association.list(source_arn=ml_workflow_context.context_arn)
for association in outgoing_associations_from_context:
    print(
        f"{ml_workflow_context.context_name} has an outgoing association to {association.destination_name}"
    )
[ ]:
# create an artifact representing inputs to the model building action
input_test_images = Artifact.create(
    artifact_name="mnist-test-images",
    artifact_type="TestData",
    source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
    source_uri=f"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/t10k-images-idx3-ubyte.gz",
)

input_test_labels = Artifact.create(
    artifact_name="mnist-test-labels",
    artifact_type="TestLabels",
    source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
    source_uri=f"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/t10k-labels-idx1-ubyte.gz",
)
[ ]:
# create an artifact representing a trained model
output_model = Artifact.create(
    artifact_name="mnist-model",
    artifact_type="Model",
    source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
    source_uri=f"s3://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/model/tensorflow-training-2020-11-20-23-57-13-077/model.tar.gz",
)
[ ]:
# associate the data set artifact with an incoming association to the example action
Association.create(
    source_arn=input_test_images.artifact_arn, destination_arn=model_build_action.action_arn
)
Association.create(
    source_arn=input_test_labels.artifact_arn, destination_arn=model_build_action.action_arn
)
[ ]:
# associate the example action with an outgoing association to the model artifact
Association.create(
    source_arn=model_build_action.action_arn, destination_arn=output_model.artifact_arn
)

Cleanup

[ ]:
def delete_associations(arn):
    # delete incoming associations
    incoming_associations = Association.list(destination_arn=arn)
    for summary in incoming_associations:
        assct = Association(
            source_arn=summary.source_arn,
            destination_arn=summary.destination_arn,
            sagemaker_session=sagemaker_session,
        )
        assct.delete()

    # delete outgoing associations
    outgoing_associations = Association.list(source_arn=arn)
    for summary in outgoing_associations:
        assct = Association(
            source_arn=summary.source_arn,
            destination_arn=summary.destination_arn,
            sagemaker_session=sagemaker_session,
        )
        assct.delete()


def delete_lineage_data():
    print(f"Deleting context {ml_workflow_context.context_name}")
    delete_associations(ml_workflow_context.context_arn)
    ctx = Context(
        context_name=ml_workflow_context.context_name, sagemaker_session=sagemaker_session
    )
    ctx.delete()

    print(f"Deleting action {model_build_action.action_name}")
    delete_associations(model_build_action.action_arn)
    actn = Action(action_name=model_build_action.action_name, sagemaker_session=sagemaker_session)
    actn.delete()

    for artifact in [input_test_images, input_test_labels, output_model]:
        print(f"Deleting artifact {artifact.artifact_arn} {artifact.artifact_name}")
        delete_associations(artifact.artifact_arn)
        artfct = Artifact(artifact_arn=artifact.artifact_arn, sagemaker_session=sagemaker_session)
        artfct.delete()


delete_lineage_data()

Caveats

  • Associations cannot be created between two experiment entities. For example between an Experiment and Trial.

  • Associations can only be created between the following resources: Action, Artifact, or Context.

  • The maximum number of manually created lineage entities are:

    • Artifacts: 6000

    • Contexts: 500

    • Actions: 3000

    • Associations: 6000

  • There is no limit on the number of lineage entities created automatically by SageMaker.

Contact

Submit any questions or issues to https://github.com/aws/sagemaker-experiments/issues or mention @aws/sagemakerexperimentsadmin

Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

This us-east-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This us-east-2 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This us-west-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ca-central-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This sa-east-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This eu-west-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This eu-west-2 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This eu-west-3 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This eu-central-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This eu-north-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ap-southeast-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ap-southeast-2 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ap-northeast-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ap-northeast-2 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable

This ap-south-1 badge failed to load. Check your device’s internet connectivity, otherwise the service is currently unavailable