Fairness and Explainability with SageMaker Clarify - JSON Lines Format
This notebook’s CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.
Runtime
This notebook takes approximately 30 minutes to run.
Contents
Overview
Amazon SageMaker Clarify helps improve your machine learning models by detecting potential bias and helping explain how these models make predictions. The fairness and explainability functionality provided by SageMaker Clarify takes a step towards enabling AWS customers to build trustworthy and understandable machine learning models. The product comes with the tools to help you with the following tasks.
Measure biases that can occur during each stage of the ML lifecycle (data collection, model training and tuning, and monitoring of ML models deployed for inference).
Generate model governance reports targeting risk and compliance teams and external regulators.
Provide explanations of the data, models, and monitoring used to assess predictions.
In doing so, the notebook will first train a SageMaker Linear Learner model using training dataset, then use Amazon SageMaker Python SDK to launch SageMaker Clarify jobs to analyze an example dataset in SageMaker JSON Lines dense format. SageMaker Clarify also supports analyzing CSV dataset, which is illustrated in another notebook.
Prerequisites and Data
Import Libraries
[2]:
import pandas as pd
import numpy as np
import os
import boto3
from datetime import datetime
from sagemaker import get_execution_role, session
Set Configurations
[3]:
# Initialize sagemaker session
sagemaker_session = session.Session()
region = sagemaker_session.boto_region_name
print(f"Region: {region}")
role = get_execution_role()
print(f"Role: {role}")
bucket = sagemaker_session.default_bucket()
prefix = "sagemaker/DEMO-sagemaker-clarify-jsonlines"
Region: us-west-2
Role: arn:aws:iam::000000000000:role/service-role/AmazonSageMaker-ExecutionRole-20220304T121686
Download data
Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/adult/
Let’s download the data and save it in the local folder with the name adult.data and adult.test from UCI repository\(^{[2]}\).
\(^{[2]}\)Dua Dheeru, and Efi Karra Taniskidou. “UCI Machine Learning Repository”. Irvine, CA: University of California, School of Information and Computer Science (2017).
[4]:
from sagemaker.s3 import S3Downloader
adult_columns = [
"Age",
"Workclass",
"fnlwgt",
"Education",
"Education-Num",
"Marital Status",
"Occupation",
"Relationship",
"Ethnic group",
"Sex",
"Capital Gain",
"Capital Loss",
"Hours per week",
"Country",
"Target",
]
if not os.path.isfile("adult.data"):
S3Downloader.download(
s3_uri="s3://{}/{}".format(
f"sagemaker-example-files-prod-{region}", "datasets/tabular/uci_adult/adult.data"
),
local_path="./",
sagemaker_session=sagemaker_session,
)
print("adult.data saved!")
else:
print("adult.data already on disk.")
if not os.path.isfile("adult.test"):
S3Downloader.download(
s3_uri="s3://{}/{}".format(
f"sagemaker-example-files-prod-{region}", "datasets/tabular/uci_adult/adult.test"
),
local_path="./",
sagemaker_session=sagemaker_session,
)
print("adult.test saved!")
else:
print("adult.test already on disk.")
adult.data saved!
adult.test saved!
Loading the data: Adult Dataset
From the UCI repository of machine learning datasets, this database contains 14 features concerning demographic characteristics of 45,222 rows (32,561 for training and 12,661 for testing). The task is to predict whether a person has a yearly income that is more or less than $50,000.
Here are the features and their possible values:
Age: continuous.
Workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
Fnlwgt: continuous (the number of people the census takers believe that observation represents).
Education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
Education-num: continuous.
Marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
Occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
Relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
Ethnic group: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
Sex: Female, Male.
Note: this data is extracted from the 1994 Census and enforces a binary option on Sex
Capital-gain: continuous.
Capital-loss: continuous.
Hours-per-week: continuous.
Native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.
Next, we specify our binary prediction task:
Target: <=50,000, >$50,000.
[5]:
training_data = pd.read_csv(
"adult.data", names=adult_columns, sep=r"\s*,\s*", engine="python", na_values="?"
).dropna()
testing_data = pd.read_csv(
"adult.test", names=adult_columns, sep=r"\s*,\s*", engine="python", na_values="?", skiprows=1
).dropna()
training_data.head()
[5]:
Age | Workclass | fnlwgt | Education | Education-Num | Marital Status | Occupation | Relationship | Ethnic group | Sex | Capital Gain | Capital Loss | Hours per week | Country | Target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 39 | State-gov | 77516 | Bachelors | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 2174 | 0 | 40 | United-States | <=50K |
1 | 50 | Self-emp-not-inc | 83311 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 13 | United-States | <=50K |
2 | 38 | Private | 215646 | HS-grad | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 40 | United-States | <=50K |
3 | 53 | Private | 234721 | 11th | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 40 | United-States | <=50K |
4 | 28 | Private | 338409 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 40 | Cuba | <=50K |
Data inspection
[6]:
%matplotlib inline
training_data["Sex"].value_counts().sort_values().plot(kind="bar", title="Counts of Sex", rot=0)
[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f751b1e0450>

[7]:
training_data["Sex"].where(training_data["Target"] == ">50K").value_counts().sort_values().plot(
kind="bar", title="Counts of Sex earning >$50K", rot=0
)
[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f751b1804d0>

Encode and Upload the Dataset
Here we encode the training and test data. Encoding input data is not necessary for SageMaker Clarify, but is necessary for the model.
[ ]:
from sklearn import preprocessing
def number_encode_features(df):
result = df.copy()
encoders = {}
for column in result.columns:
if result.dtypes[column] == np.object:
encoders[column] = preprocessing.LabelEncoder()
# print('Column:', column, result[column])
result[column] = encoders[column].fit_transform(result[column].fillna("None"))
return result, encoders
training_data, _ = number_encode_features(training_data)
testing_data, _ = number_encode_features(testing_data)
Then save the testing dataset to a JSON Lines file. The file conforms to SageMaker JSON Lines dense format, with an additional field to hold the ground truth label.
[9]:
import json
def dump_to_jsonlines_file(df, filename):
with open(filename, "w") as f:
for _, row in df.iterrows():
sample = {"features": row[0:-1].tolist(), "label": int(row[-1])}
print(json.dumps(sample), file=f)
dump_to_jsonlines_file(testing_data, "test_data.jsonl")
A quick note about our encoding: the “Female” Sex value has been encoded as 0 and “Male” as 1.
[10]:
!head -n 5 test_data.jsonl
{"features": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37], "label": 0}
{"features": [38, 2, 89814, 11, 9, 2, 4, 0, 4, 1, 0, 0, 50, 37], "label": 0}
{"features": [28, 1, 336951, 7, 12, 2, 10, 0, 4, 1, 0, 0, 40, 37], "label": 1}
{"features": [44, 2, 160323, 15, 10, 2, 6, 0, 2, 1, 7688, 0, 40, 37], "label": 1}
{"features": [34, 2, 198693, 0, 6, 4, 7, 1, 4, 1, 0, 0, 30, 37], "label": 0}
[11]:
testing_data.head()
[11]:
Age | Workclass | fnlwgt | Education | Education-Num | Marital Status | Occupation | Relationship | Ethnic group | Sex | Capital Gain | Capital Loss | Hours per week | Country | Target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 25 | 2 | 226802 | 1 | 7 | 4 | 6 | 3 | 2 | 1 | 0 | 0 | 40 | 37 | 0 |
1 | 38 | 2 | 89814 | 11 | 9 | 2 | 4 | 0 | 4 | 1 | 0 | 0 | 50 | 37 | 0 |
2 | 28 | 1 | 336951 | 7 | 12 | 2 | 10 | 0 | 4 | 1 | 0 | 0 | 40 | 37 | 1 |
3 | 44 | 2 | 160323 | 15 | 10 | 2 | 6 | 0 | 2 | 1 | 7688 | 0 | 40 | 37 | 1 |
5 | 34 | 2 | 198693 | 0 | 6 | 4 | 7 | 1 | 4 | 1 | 0 | 0 | 30 | 37 | 0 |
Lastly, let’s upload the data to S3
[12]:
from sagemaker.s3 import S3Uploader
test_data_uri = S3Uploader.upload("test_data.jsonl", "s3://{}/{}".format(bucket, prefix))
Train Linear Learner Model
Train Model
Since our focus is on understanding how to use SageMaker Clarify, we keep it simple by using a standard Linear Learner model.
It takes about 5 minutes for the model to be trained.
[13]:
from sagemaker.image_uris import retrieve
from sagemaker.amazon.linear_learner import LinearLearner
ll = LinearLearner(
role,
instance_count=1,
instance_type="ml.m5.xlarge",
predictor_type="binary_classifier",
sagemaker_session=sagemaker_session,
)
training_target = training_data["Target"].to_numpy().astype(np.float32)
training_features = training_data.drop(["Target"], axis=1).to_numpy().astype(np.float32)
ll.fit(ll.record_set(training_features, training_target), logs=False)
INFO:sagemaker.image_uris:Same images used for training and inference. Defaulting to image scope: inference.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: linear-learner-2023-02-07-03-39-06-446
2023-02-07 03:39:06 Starting - Starting the training job..
2023-02-07 03:39:21 Starting - Preparing the instances for training..........
2023-02-07 03:40:13 Downloading - Downloading input data....
2023-02-07 03:40:38 Training - Downloading the training image..........
2023-02-07 03:41:34 Training - Training image download completed. Training in progress....
2023-02-07 03:41:54 Uploading - Uploading generated training model.
2023-02-07 03:42:06 Completed - Training job completed
Create Model
Here we create the SageMaker model.
[14]:
model_name = "DEMO-clarify-ll-model-{}".format(datetime.now().strftime("%d-%m-%Y-%H-%M-%S"))
model = ll.create_model(name=model_name)
container_def = model.prepare_container_def()
sagemaker_session.create_model(model_name, role, container_def)
INFO:sagemaker.image_uris:Same images used for training and inference. Defaulting to image scope: inference.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating model with name: DEMO-clarify-ll-model-07-02-2023-03-42-08
[14]:
'DEMO-clarify-ll-model-07-02-2023-03-42-08'
Amazon SageMaker Clarify
With your model set up, it’s time to explore SageMaker Clarify. For a general overview of how SageMaker Clarify processing jobs work, refer the provided link.
[15]:
from sagemaker import clarify
# Initialize a SageMakerClarifyProcessor to compute bias metrics and model explanations.
clarify_processor = clarify.SageMakerClarifyProcessor(
role=role, instance_count=1, instance_type="ml.m5.xlarge", sagemaker_session=sagemaker_session
)
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: 1.0.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
Detecting Bias
SageMaker Clarify helps you detect possible pre-training and post-training biases using a variety of metrics.
Writing DataConfig
A DataConfig object communicates some basic information about data I/O to SageMaker Clarify. For our example here we provide the below information:
s3_data_input_path
: S3 URI of the train dataset we uploaded aboves3_output_path
: S3 URI at which our output report will be uploadedheaders
: The list of column names in the dataset. SageMaker Clarify will load the JSON Lines dataset into tabular representation for further analysis, and argumentheaders
is the list of column names. The label header should be the last one in the headers list, and the order of feature headers should be the same as the order of features in a sample.dataset_type
: specifies the format of your dataset, for this example as we are using JSON Lines dataset this will beapplication/jsonlines
label
: Specifies the ground truth label, which is also known as observed label or target attribute.features
: JMESPath expression to locate the feature columns for bias metrics if the dataset format is JSON Lines.
Note: Argument features
or label
above are NOT header string. Instead, it is a JMESPath string to locate the features list or label in the dataset. For example, for a sample like below, features
should be data.features.values
, and label
should be data.label
.
{"data": {"features": {"values": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37]}, "label": 0}}
[16]:
bias_report_output_path = "s3://{}/{}/clarify-bias".format(bucket, prefix)
bias_data_config = clarify.DataConfig(
s3_data_input_path=test_data_uri,
s3_output_path=bias_report_output_path,
features="features",
label="label",
headers=testing_data.columns.to_list(),
dataset_type="application/jsonlines",
)
Writing ModelConfig
A ModelConfig object communicates information about your trained model. To avoid additional traffic to the production models, SageMaker Clarify sets up and tears down a dedicated endpoint when processing. For our example here we provide the below information:
model_name
: name of the concerned model, using name of the linear learner model trained earlierinstance_type
andinitial_instance_count
specify your preferred instance type and instance count used to run your model on during SageMaker Clarify’s processing. The example dataset is small, so a single standard instance is good enough to run this example.accept_type
denotes the endpoint response payload format, andcontent_type
denotes the payload format of request to the endpoint. As per the example model we created above both of these will beapplication/jsonlines
.content_template
is used by SageMaker Clarify to compose the request payload if the content type is JSON Lines. To be more specific, the placeholder$features
will be replaced by the features list from samples. The request payload of a sample from the testing dataset happens to be similar to the sample itself, like'{"features": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37]}'
, because both the dataset and the model input conform to SageMaker JSON Lines dense format.
[17]:
model_config = clarify.ModelConfig(
model_name=model_name,
instance_type="ml.m5.xlarge",
instance_count=1,
accept_type="application/jsonlines",
content_type="application/jsonlines",
content_template='{"features":$features}',
)
Writing ModelPredictedLabelConfig
A ModelPredictedLabelConfig
provides information on the format of your predictions. The argument label
is a JMESPath string to locate the predicted label in endpoint response. In this case, the response payload for a single sample request looks like '{"predicted_label": 0, "score": 0.013525663875043}'
, so SageMaker Clarify can find predicted label 0
by JMESPath 'predicted_label'
. There is also probability score in the response, so it is possible to use another combination of
arguments to decide the predicted label by a custom threshold, for example probability='score'
and probability_threshold=0.8
.
[18]:
predictions_config = clarify.ModelPredictedLabelConfig(label="predicted_label")
If you are building your own model, then you may choose a different JSON Lines format, as long as it has the key elements like label and features list, and request payload built using content_template
is supported by the model (you can customize the template but the placeholder of features list must be $features
). Also, dataset_type
, accept_type
and content_type
don’t have to be the same, for example, a use case may use CSV dataset and content type, but JSON Lines accept
type.
Writing BiasConfig
BiasConfig contains configuration values for detecting bias using a Clarify container.
[19]:
bias_config = clarify.BiasConfig(
label_values_or_threshold=[1], facet_name="Sex", facet_values_or_threshold=[0], group_name="Age"
)
For our demo we provide the following information in BiasConfig API:
label_values_or_threshold
: List of label value(s) or threshold to indicate positive outcome used for bias metrics. Here positive outcome is earning >$50,000.facet_name
: Sensitive columns of the dataset, “Sex” is the categoryfacet_values_or_threshold
: values of the sensitive group, “Female” respondents are the sensitive group.group_name
: This example has selected the “Age” column which is used to form subgroups for the measurement of bias metric Conditional Demographic Disparity (CDD) or Conditional Demographic Disparity in Predicted Labels (CDDPL).
SageMaker Clarify can handle both categorical and continuous data for facet: values_or_threshold
and for label_values_or_threshold
. In this case we are using categorical data. The results will show if the model has a preference for records of one sex over the other.
Pre-training Bias
Bias can be present in your data before any model training occurs. Inspecting your data for bias before training begins can help detect any data collection gaps, inform your feature engineering, and help you understand what societal biases the data may reflect.
Computing pre-training bias metrics does not require a trained model.
Post-training Bias
Computing post-training bias metrics does require a trained model.
Unbiased training data (as determined by concepts of fairness measured by bias metric) may still result in biased model predictions after training. Whether this occurs depends on several factors including hyperparameter choices.
You can run these options separately with run_pre_training_bias()
and run_post_training_bias()
or at the same time with run_bias()
as shown below. We use following additional parameters for the api call: * pre_training_methods
: Pre-training bias metrics to be computed. The detailed description of the metrics can be found on Measure Pre-training Bias. This example sets methods to “all” to compute
all the pre-training bias metrics. * post_training_methods
: Post-training bias metrics to be computed. The detailed description of the metrics can be found on Measure Post-training Bias. This example sets methods to “all” to compute all the post-training bias metrics.
[ ]:
# The job takes about 10 minutes to run
clarify_processor.run_bias(
data_config=bias_data_config,
bias_config=bias_config,
model_config=model_config,
model_predicted_label_config=predictions_config,
pre_training_methods="all",
post_training_methods="all",
)
Viewing the Bias Report
In Studio, you can view the results under the experiments tab.
Each bias metric has detailed explanations with examples that you can explore.
You could also summarize the results in a handy table!
If you’re not a Studio user yet, you can access the bias report in PDF, HTML and ipynb formats in the following S3 bucket:
[21]:
bias_report_output_path
[21]:
's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-sagemaker-clarify-jsonlines/clarify-bias'
Explaining Predictions
There are expanding business needs and legislative regulations that require explanations of why a model made the decision it did. SageMaker Clarify uses Kernel SHAP to explain the contribution that each input feature makes to the final decision.
For run_explainability API call we need similar DataConfig
and ModelConfig
objects we defined above. SHAPConfig here is the config class for Kernel SHAP algorithm.
For our demo we pass the following information in SHAPConfig
:
baseline
: Kernel SHAP algorithm requires a baseline (also known as background dataset). If not provided, a baseline is calculated automatically by SageMaker Clarify using K-means or K-prototypes in the input dataset. Baseline dataset type shall be the same as dataset_type, and baseline samples shall only include features. By definition, baseline should either be a S3 URI to the baseline dataset file, or an in-place list of samples. In this case we chose the latter, and use the mean of our dataset as baseline. For more details on baseline selection please refer this documentation.num_samples
: Number of samples to be used in the Kernel SHAP algorithm. This number determines the size of the generated synthetic dataset to compute the SHAP values.agg_method
: Aggregation method for global SHAP values. For our example here we are usingmean_abs
i.e. mean of absolute SHAP values for all instancessave_local_shap_values
: Indicates whether to save the local SHAP values in the output location.
[22]:
# Similarly, excluding label header from headers list
headers = testing_data.columns.to_list()
headers.remove("Target")
print(headers)
explainability_output_path = "s3://{}/{}/clarify-explainability".format(bucket, prefix)
explainability_data_config = clarify.DataConfig(
s3_data_input_path=test_data_uri,
s3_output_path=explainability_output_path,
features="features",
headers=headers,
dataset_type="application/jsonlines",
)
['Age', 'Workclass', 'fnlwgt', 'Education', 'Education-Num', 'Marital Status', 'Occupation', 'Relationship', 'Ethnic group', 'Sex', 'Capital Gain', 'Capital Loss', 'Hours per week', 'Country']
[23]:
baseline_record = testing_data.mean().iloc[:-1].round().values.tolist()
baseline = {"features": baseline_record}
shap_config = clarify.SHAPConfig(
baseline=[baseline], num_samples=15, agg_method="mean_abs", save_local_shap_values=False
)
Run the explainability job, note that Kernel SHAP algorithm requires probability prediction, so JMESPath "score"
is used to extract the probability.
[ ]:
# The job takes about 10 minutes to run
clarify_processor.run_explainability(
data_config=explainability_data_config,
model_config=model_config,
explainability_config=shap_config,
model_scores="score",
)
Viewing the Explainability Report
As with the bias report, you can view the explainability report in Studio under the experiments tab
The Model Insights tab contains direct links to the report and model insights.
If you’re not a Studio user yet, as with the Bias Report, you can access this report at the following S3 bucket.
[26]:
explainability_output_path
[26]:
's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-sagemaker-clarify-jsonlines/clarify-explainability'
Note: You can run both bias and explainability jobs at the same time with run_bias_and_explainability()
, refer API Documentation for more details.
Clean Up
Finally, don’t forget to clean up the resources we set up and used for this demo!
[27]:
sagemaker_session.delete_model(model_name)
INFO:sagemaker:Deleting model with name: DEMO-clarify-ll-model-07-02-2023-03-42-08
Notebook CI Test Results
This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.