Quick Start - Using @step Decorated Steps with ConditionStep
This notebook’s CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.
We’re introducing a low-code experience for data scientists to convert the Machine Learning (ML) development code into repeatable and reusable workflow steps of Amazon SageMaker Pipelines. This sample notebook is a quick introduction to this capability with dummy Python functions wrapped as pipeline steps. It demonstrates how this capability works with the ConditionStep. The pipeline in this notebook contains a dummy evaluate model step, which generates a random number as the dummy RMSE (Root Mean Square Error) value. This RMSE value is passed to the ConditionStep and compared with a baseline, so that a dummy register model step would be conditionally invoked if the RMSE is lower than the baseline. Otherwise, a FailStep would end up the pipeline execution in the failed status.
Note this notebook can only run on either Python 3.8 or Python 3.10. Otherwise, you will get an error message prompting you to provide an image_uri
when defining a step.
Install the dependencies and setup configuration file path
If you run the notebook from a local IDE outside of SageMaker, please follow the “AWS CLI Prerequisites” section of the Set Up Amazon SageMaker Prerequisites to set up AWS credentials.
[ ]:
!pip install -r ./requirements.txt
[ ]:
import os
# Set path to config file
os.environ["SAGEMAKER_USER_CONFIG_OVERRIDE"] = os.getcwd()
Define pipeline steps
[ ]:
import random
from sagemaker.workflow.function_step import step
evaluate_func_step_name = "Evaluate"
@step(name=evaluate_func_step_name, keep_alive_period_in_seconds=300)
def my_evaluate_model():
random_number = random.randrange(0, 10)
print(f"Generated random number: {random_number}")
return {"rmse": random_number}
[ ]:
@step(name="Register", keep_alive_period_in_seconds=300)
def my_register_model():
print("Registered!")
We can easily add conditional checks to @step
decorated steps via the already offered ConditionStep
. For instance, in the cell below, we create a condition - ConditionLessThan
for the ConditionStep, which refers to the output of the my_evaluate_model
function.
[ ]:
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.conditions import ConditionLessThan
from sagemaker.workflow.fail_step import FailStep
conditionally_register = ConditionStep(
name="ConditionallyRegister",
conditions=[
ConditionLessThan(
# Output of the evaluate step must be json serializable
# to be consumed in the condition evaluation
left=my_evaluate_model()["rmse"],
right=5,
)
],
if_steps=[my_register_model()],
else_steps=[FailStep(name="Fail", error_message="Model performance is not good enough")],
)
pipeline_name = "Dummy-ML-Pipeline"
pipeline = Pipeline(
name=pipeline_name,
steps=[conditionally_register],
)
Create the pipeline and run pipeline execution
[ ]:
import sagemaker
# Note: sagemaker.get_execution_role does not work outside sagemaker
role = sagemaker.get_execution_role()
pipeline.upsert(role_arn=role)
[ ]:
execution = pipeline.start(parallelism_config=dict(MaxParallelExecutionSteps=10))
Note: the pipeline execution may enter the FailStep
and be marked as failed if the my_evaluate_model
function generates a number which is larger or equal to 5.
[ ]:
try:
execution.wait()
except Exception as e:
print(e)
[ ]:
execution.list_steps()
[ ]:
execution.result(step_name=evaluate_func_step_name)
Clean up resources
[ ]:
pipeline.delete()
Notebook CI Test Results
This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.
[ ]: