Explaining text sentiment analysis using SageMaker Clarify

This notebook’s CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

Runtime

This notebook takes approximately 40 minutes to run.

Overview
Prerequisites and Data
Train and Deploy Hugging Face Model
1. Train model with Hugging Face estimator
2. Deploy Model to Endpoint
Model Explainability with SageMaker Clarify for text features

Overview

Amazon SageMaker Clarify helps improve your machine learning models by detecting potential bias and helping explain how these models make predictions. The fairness and explainability functionality provided by SageMaker Clarify takes a step towards enabling AWS customers to build trustworthy and understandable machine learning models. The product comes with the tools to help you with the following tasks.

Measure biases that can occur during each stage of the ML lifecycle (data collection, model training and tuning, and monitoring of ML models deployed for inference).
Generate model governance reports targeting risk and compliance teams and external regulators.
Provide explanations of the data, models, and monitoring used to assess predictions for input containing data of various modalities like numerical data, categorical data, text, and images.

Learn more about SageMaker Clarify here. This sample notebook walks you through: 1. Key terms and concepts needed to understand SageMaker Clarify 1. Explaining text features with Kernel SHAP 1. Visualizing the local SHAP explanations

In doing so, the notebook will first train a Hugging Face model using the Hugging Face Estimator in the SageMaker Python SDK using the training dataset, then use SageMaker Clarify to analyze a testing dataset in CSV format, and then visualize the results.

Prerequisites and Data

We require the following AWS resources to be able to successfully run this notebook. 1. Kernel: Python 3 (Data Science) kernel on SageMaker Studio or conda_python3 kernel on notebook instances 2. Instance type: Any GPU instance. Here, we use ml.g4dn.xlarge 3. SageMaker Python SDK version 2.70.0 or greater 4. Transformers > 4.6.1 5. Datasets > 1.6.2

Let’s start by installing the required packages.

[ ]:

!pip install "datasets[s3]==1.6.2" "transformers==4.6.1" --upgrade --quiet

[ ]:

!pip install sagemaker --upgrade --quiet
!pip install boto3 --upgrade --quiet
!pip install botocore --upgrade --quiet

[ ]:

!pip install "torch==1.6" --upgrade --quiet
!pip install captum --upgrade --quiet

Import libraries

[35]:

import os
import csv
import numpy as np
import pandas as pd
import json
import tarfile
from datetime import datetime
from typing import List, Tuple

import boto3
import botocore
import sagemaker
from sagemaker.huggingface import HuggingFace
from sagemaker.pytorch import PyTorchModel
from sagemaker.s3 import S3Uploader
from sagemaker import get_execution_role, clarify, Session
from captum.attr import visualization
from sklearn.model_selection import train_test_split
from datasets import Dataset

[6]:

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
role = sagemaker.get_execution_role()

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sagemaker_session.default_bucket()}")
print(f"sagemaker session region: {sagemaker_session.boto_region_name}")

sagemaker role arn: arn:aws:iam::000000000000:role/service-role/AmazonSageMaker-ExecutionRole-20221010T162799
sagemaker bucket: sagemaker-us-west-2-000000000000
sagemaker session region: us-west-2

[7]:

prefix = "DEMO-sagemaker-clarify-text"

s3_prefix = f"sagemaker/{prefix}"
s3_key = f"s3://{bucket}/{s3_prefix}"

model_name = f"{prefix}-model"
endpoint_config_name = f"{prefix}-endpoint-config"
endpoint_name = f"{prefix}-endpoint"

# SageMaker Clarify model directory name
model_path = "model/"

# Instance type for training and hosting
instance_type = "ml.m5.xlarge"

If you change the value of model_path variable above, please be sure to update the model_path in `code/inference.py <./code/inference.py>`__ script as well.

Loading the data: Women’s E-Commerce clothing reviews dataset

The Women’s Clothing E-Commerce dataset contains reviews written by customers. This dataset contains 23486 rows and 10 columns, where each row corresponds to a customer review.

The columns include:

Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
Age: Positive Integer variable of the reviewer’s age.
Title: String variable for the title of the review.
Review Text: String variable for the review body.
Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
Division Name: Categorical name of the product high level division.
Department Name: Categorical name of the product department name.
Class Name: Categorical name of the product class name.

Because the dataset contains real commercial data, it has been anonymized, and any references to the company in the review text and body have been replaced with “retailer”.

Goal: To predict the sentiment of a review based on the text, and then explain the predictions using SageMaker Clarify.

Data Source: https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/

The Women’s E-Commerce Clothing Reviews dataset has been made available under a Creative Commons Public Domain license. A copy of the dataset has been saved in a sample data Amazon S3 bucket. Let’s download the dataset.

[ ]:

s3 = boto3.client("s3")
s3.download_file(
    f"sagemaker-example-files-prod-{sagemaker_session.boto_region_name}",
    "datasets/tabular/womens_clothing_ecommerce/Womens_Clothing_E-Commerce_Reviews.csv",
    "womens_clothing_reviews_dataset.csv",
)

[9]:

df = pd.read_csv("womens_clothing_reviews_dataset.csv", index_col=[0])
df.head()

[9]:

	Clothing ID	Age	Title	Review Text	Rating	Recommended IND	Positive Feedback Count	Division Name	Department Name	Class Name
0	767	33	NaN	Absolutely wonderful - silky and sexy and comf...	4	1	0	Initmates	Intimate	Intimates
1	1080	34	NaN	Love this dress! it's sooo pretty. i happene...	5	1	4	General	Dresses	Dresses
2	1077	60	Some major design flaws	I had such high hopes for this dress and reall...	3	0	0	General	Dresses	Dresses
3	1049	50	My favorite buy!	I love, love, love this jumpsuit. it's fun, fl...	5	1	0	General Petite	Bottoms	Pants
4	847	47	Flattering shirt	This shirt is very flattering to all due to th...	5	1	6	General	Tops	Blouses

Data preparation for model training

Since the dataset does not contain a column that indicates the sentiment of the customer reviews, let’s create one to specify our binary prediction task. To do this, let’s assume that reviews with a Rating of 4 or higher indicate positive sentiment and reviews with a Rating of 2 or lower indicate negative sentiment. Let’s also assume that a Rating of 3 indicates neutral sentiment and exclude these rows from the dataset. Additionally, to predict the sentiment of a review, we are going to use the Review Text column; therefore let’s remove rows that are empty in the Review Text column of the dataset.

[10]:

pd.options.mode.chained_assignment = None


def create_target_column(df, min_positive_score, max_negative_score):
    neutral_values = [i for i in range(max_negative_score + 1, min_positive_score)]
    for neutral_value in neutral_values:
        df = df[df["Rating"] != neutral_value]
    df["Sentiment"] = df["Rating"] >= min_positive_score
    replace_dict = {True: 1, False: 0}
    df["Sentiment"] = df["Sentiment"].map(replace_dict)
    return df


df = create_target_column(df, 4, 2)
df = df[~df["Review Text"].isna()]

The most common approach for model evaluation is using the train/validation/test split. Although this approach can be very effective in general, it can result in misleading results and potentially fail when used on classification problems with a severe class imbalance. Instead, the technique must be modified to stratify the sampling by the class label as below. Stratification ensures that all classes are well represented across the train, validation and test datasets.

[11]:

target = "Sentiment"
cols = "Review Text"

X = df[cols]
y = df[target]

# Data split: 11%(val) of the 90% (train and test) of the dataset ~ 10%; resulting in 80:10:10split
test_dataset_size = 0.10
val_dataset_size = 0.11
RANDOM_STATE = 42

# Stratified train-val-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=test_dataset_size, stratify=y, random_state=RANDOM_STATE
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=val_dataset_size, stratify=y_train, random_state=RANDOM_STATE
)

print(
    "Dataset: train ",
    X_train.shape,
    y_train.shape,
    y_train.value_counts(dropna=False, normalize=True).to_dict(),
)
print(
    "Dataset: validation ",
    X_val.shape,
    y_val.shape,
    y_val.value_counts(dropna=False, normalize=True).to_dict(),
)
print(
    "Dataset: test ",
    X_test.shape,
    y_test.shape,
    y_test.value_counts(dropna=False, normalize=True).to_dict(),
)

# Combine the independent columns with the label
df_train = pd.concat([X_train, y_train], axis=1).reset_index(drop=True)
df_test = pd.concat([X_test, y_test], axis=1).reset_index(drop=True)
df_val = pd.concat([X_val, y_val], axis=1).reset_index(drop=True)

Dataset: train  (15874,) (15874,) {1: 0.8804334131283861, 0: 0.11956658687161396}
Dataset: validation  (1962,) (1962,) {1: 0.8802242609582059, 0: 0.11977573904179409}
Dataset: test  (1982,) (1982,) {1: 0.8804238143289607, 0: 0.11957618567103935}

[36]:

df_train.to_csv("train.csv", index=False, header=False)
df_val.to_csv("test.csv", index=False, header=False)

train_dataset = Dataset.from_pandas(df_train)
val_dataset = Dataset.from_pandas(df_val)

Here, we upload the prepared datasets to S3 buckets so that we can train the model with the Hugging Face Estimator.

[13]:

training_input_path = f"s3://{sagemaker_session.default_bucket()}/{s3_prefix}/train"
val_input_path = f"s3://{sagemaker_session.default_bucket()}/{s3_prefix}/test"

train_uri = S3Uploader.upload("train.csv", training_input_path)
test_uri = S3Uploader.upload("test.csv", val_input_path)

We have split the dataset into train, test, and validation datasets. We use the train and validation datasets during training process, and run Clarify on the test dataset.

In the cell below, we convert the Pandas DataFrames into Hugging Face Datasets for downstream modeling.

Train and Deploy Hugging Face Model

In this step of the workflow, we use the Hugging Face Estimator to load the pre-trained distilbert-base-uncased model and fine-tune the model on our dataset.

Train model with Hugging Face estimator

The hyperparameters defined below are parameters that are passed to the custom PyTorch code in `scripts/train.py <./scripts/train.py>`__. The only required parameter is model_name. The other parameters like epoch, train_batch_size all have default values which can be overridden by setting their values here.

[ ]:

# Hyperparameters passed into the training job
hyperparameters = {
    "epochs": 1,
    "model_name": "distilbert-base-uncased",
    "train_file": "train.csv",
    "test_file": "test.csv",
}

huggingface_estimator = HuggingFace(
    entry_point="train.py",
    source_dir="scripts",
    instance_type="ml.g4dn.xlarge",
    instance_count=1,
    transformers_version="4.6.1",
    pytorch_version="1.7.1",
    py_version="py36",
    role=role,
    hyperparameters=hyperparameters,
    disable_profiler=True,
    debugger_hook_config=False,
)

# starting the train job with our uploaded datasets as input
huggingface_estimator.fit({"train": training_input_path, "test": val_input_path})

Download the trained model files for model inference

[ ]:

! aws s3 cp {huggingface_estimator.model_data} model.tar.gz
! mkdir -p {model_path}
! tar -xvf model.tar.gz -C  {model_path}/

Deploy Model

We are going to use the trained model files along with the PyTorch Inference container to deploy the model to a SageMaker endpoint.

[16]:

with tarfile.open("hf_model.tar.gz", mode="w:gz") as archive:
    archive.add(model_path, recursive=True)
    archive.add("code/")
prefix = s3_prefix.split("/")[-1]
zipped_model_path = sagemaker_session.upload_data(
    path="hf_model.tar.gz", key_prefix=prefix + "/hf-model-sm"
)

[17]:

model_name = "womens-ecommerce-reviews-model-{}".format(
    datetime.now().strftime("%d-%m-%Y-%H-%M-%S")
)
endpoint_name = "womens-ecommerce-reviews-endpoint-{}".format(
    datetime.now().strftime("%d-%m-%Y-%H-%M-%S")
)

[ ]:

model = PyTorchModel(
    entry_point="inference.py",
    name=model_name,
    model_data=zipped_model_path,
    role=get_execution_role(),
    framework_version="1.7.1",
    py_version="py3",
)
predictor = model.deploy(
    initial_instance_count=1, instance_type="ml.g4dn.xlarge", endpoint_name=endpoint_name
)

Let’s test the model endpoint to ensure that deployment was successful.

[19]:

test_sentence1 = "A very versatile and cozy top. would look great dressed up or down for a casual comfy fall day. what a fun piece for my wardrobe!"
test_sentence2 = "Love the color! very soft. unique look. can't wait to wear it this fall"
test_sentence3 = (
    "These leggings are loose fitting and the quality is just not there.. i am returning the item."
)
test_sentence4 = "Very disappointed the back of this blouse is plain, not as displayed."

predictor = sagemaker.predictor.Predictor(endpoint_name, sagemaker_session)
predictor.serializer = sagemaker.serializers.CSVSerializer()
predictor.deserializer = sagemaker.deserializers.CSVDeserializer()
predictor.predict([[test_sentence1], [test_sentence2], [test_sentence3], [test_sentence4]])

[19]:

[['0.99707377'], ['0.99726886'], ['0.039497007'], ['0.040232953']]

Amazon SageMaker Clarify

With your model set up, we are ready to get explanations for text data from Clarify processing job. Please visit here for a general overview of how Clarify processing jobs work.

[ ]:

clarify_processor = clarify.SageMakerClarifyProcessor(
    role=role, instance_count=1, instance_type="ml.m5.xlarge", sagemaker_session=sagemaker_session
)

Model Explainability for text features

To speed up the analysis, let’s take 10 samples from the testing dataset. We create a CSV file to store the testing dataset and filter out any reviews with less than 500 characters as long reviews provide better visualization.

[21]:

def filter_dataset(file_path, num_examples):
    df_test["len"] = df_test["Review Text"].apply(lambda ele: len(ele))
    df_test_clarify = pd.DataFrame(
        df_test[df_test["len"] > 500].sample(n=num_examples, random_state=RANDOM_STATE),
        columns=["Review Text"],
    )
    df_test_clarify.to_csv(file_path, header=True, index=False)
    return df_test_clarify


data_file_path = "clarify_data.csv"
num_examples = 10

df_test_clarify = filter_dataset(data_file_path, num_examples)

A DataConfig object communicates some basic information about data I/O to SageMaker Clarify. For our example here we provide the below information:

s3_data_input_path: S3 URI of the train dataset we uploaded above
s3_output_path: S3 URI at which our output report will be uploaded
headers: The list of column names in the dataset
dataset_type: specifies the format of your dataset, for this example as we are using CSV dataset this will be text/csv

[22]:

explainability_output_path_sentence = f"{s3_key}/clarify-text-explainability-sentence"
explainability_data_config = clarify.DataConfig(
    s3_data_input_path=data_file_path,
    s3_output_path=explainability_output_path_sentence,
    headers=["Review Text"],
    dataset_type="text/csv",
)

A ModelConfig object communicates information about your trained model. To avoid additional traffic to the production models, SageMaker Clarify sets up and tears down a dedicated endpoint when processing. For our example here we provide the below information:

model_name: name of the model trained above
instance_type and instance_count specify your preferred instance type and instance count used to run your model on during SageMaker Clarify’s processing. The example dataset is small, so a single standard instance is sufficient to run this example.
accept_type denotes the endpoint response payload format, and content_type denotes the payload format of request to the endpoint. As per the example model we created above both of these will be text/csv.

[23]:

model_config = clarify.ModelConfig(
    model_name=model_name,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    accept_type="text/csv",
    content_type="text/csv",
)

A TextConfig object provides information needed to compute explanations for the text features in your dataset. It includes the below parameters:

granularity (required): To explain text features, Clarify further breaks down text into smaller text units, and considers each such text unit as a feature. The parameter granularity informs the level to which Clarify will break down the text: token, sentence, or paragraph are the allowed values for granularity.
language (required): the language of the text features. This is required to tokenize the text to break them down to their granular form.
max_top_tokens (optional): the number of top token attributions that will be shown in the output (we need this because the size of vocabulary can be very big). This is an optional parameter, here we use the default of 50.

Here we will set the granularity to “sentence”. We will also run the explainability analysis with granularity set to “token” later and compare the outputs.

[24]:

text_config = clarify.TextConfig(
    granularity="sentence",
    language="english",
)

A SHAPConfig object provides information needed for the Kernel SHAP algorithm. It contains the following parameters:

baseline: The Kernel SHAP algorithm requires a baseline (also known as background dataset). For text feature, the baseline values must be the value you want to replace the individual text feature (token, sentence or paragraph) with. For instance, in the example below, we have chosen the baseline values for review_text as <UNK>, and granularity is sentence. Every time a sentence has to replaced in the perturbed inputs, we will replace it with <UNK>. For text features, if baseline is not provided, the default replacement value will be the string <PAD>. For more details on baseline selection please refer this documentation.
num_samples: Number of samples to be used in the Kernel SHAP algorithm. This number determines the size of the generated synthetic dataset to compute the SHAP values.
agg_method: Aggregation method for global SHAP values. For our example here we are using mean_abs i.e. mean of absolute SHAP values for all instances.
save_local_shap_values: Indicates whether to save the local SHAP values in the output location. Default is True.

[25]:

shap_config = clarify.SHAPConfig(
    baseline=[["<UNK>"]],
    num_samples=1000,
    agg_method="mean_abs",
    save_local_shap_values=True,
    text_config=text_config,
)

Now we can run the explainability job with run_explainability. The below cell takes about 15 minutes to run.

[ ]:

clarify_processor.run_explainability(
    data_config=explainability_data_config,
    model_config=model_config,
    explainability_config=shap_config,
)

Visualize local explanations

We use Captum to visualize the feature importances computed by Clarify. First, let’s load the local explanations. Local text explanations can be found in the analysis results folder in a file named out.jsonl in the explanations_shap directory.

[27]:

def load_local_explanations(explainability_output_path):
    local_feature_attributions_file = "out.jsonl"
    analysis_results = []
    analysis_result = sagemaker.s3.S3Downloader.download(
        explainability_output_path + "/explanations_shap/" + local_feature_attributions_file,
        local_path="./",
    )

    shap_output = []
    file = sagemaker.s3.S3Downloader.read_file(
        explainability_output_path + "/explanations_shap/" + local_feature_attributions_file
    )
    for line in file.split("\n"):
        if line:
            shap_output.append(json.loads(line))
    return shap_output


shap_output = load_local_explanations(explainability_output_path_sentence)

Let’s take a look at the list of local explanations and examine it’s output format. The local explanations file is a JSON Lines file that contains the explanation of one instance per row as seen below.

[28]:

print(json.dumps(shap_output[0], indent=2))

{
  "explanations": [
    {
      "attributions": [
        {
          "attribution": [
            0.018564795006070663
          ],
          "description": {
            "partial_text": "I caught a sneak peak of this beautiful dress on a local retailer instagram page...and i was so excited when it arrived at my store.",
            "start_idx": 0
          }
        },
        {
          "attribution": [
            0.025437059774282594
          ],
          "description": {
            "partial_text": "i love this dress!",
            "start_idx": 133
          }
        },
        {
          "attribution": [
            0.006905820849337797
          ],
          "description": {
            "partial_text": "i went with the black because i loved how bold it was.",
            "start_idx": 152
          }
        },
        {
          "attribution": [
            0.023049811892384125
          ],
          "description": {
            "partial_text": "it's ultra feminine and flowy.",
            "start_idx": 207
          }
        },
        {
          "attribution": [
            0.024735707176048522
          ],
          "description": {
            "partial_text": "the slip underneath has the prettiest embroidered print and the overlay is light and airy.",
            "start_idx": 238
          }
        },
        {
          "attribution": [
            0.02565712420143487
          ],
          "description": {
            "partial_text": "the bottom is hemmed with a little lace peekaboo and it is wonderful.",
            "start_idx": 329
          }
        },
        {
          "attribution": [
            0.0012666796363134775
          ],
          "description": {
            "partial_text": "i tried this on in both a l and an xl.",
            "start_idx": 399
          }
        },
        {
          "attribution": [
            0.019228301464128017
          ],
          "description": {
            "partial_text": "the xl fit well but the dress was a little long on top, so i wen",
            "start_idx": 438
          }
        }
      ],
      "data_type": "free_text",
      "feature_name": "Review Text"
    }
  ]
}

At the highest level of this JSON Line, there are two keys: explanations, join_source_value (not present here as we have not included a joinsource column in the input dataset). The key explanations contains a list of attributions for each feature in the dataset. In this case, we have a single element, because the input dataset also had a single feature. It also contains details like feature_name, data_type of the features (indicating whether Clarify inferred the column as numerical, categorical or text). Each token attribution also contains a description field that contains the token itself, and the starting index of the token in original input. This allows you to reconstruct the original sentence from the output as well.

In the following cell, we create a list of attributions and a list of tokens for use in visualizations.

[29]:

def create_visualization_datasets(shap_explanations: List) -> Tuple[List[List], List[List]]:
    """
    This function extracts the individual tokens and corresponding attribution values from the local explanations for each entry in the dataset
    """
    attributions_dataset = [
        np.array([attr["attribution"][0] for attr in expl["explanations"][0]["attributions"]])
        for expl in shap_explanations
    ]

    tokens_dataset = [
        np.array(
            [
                attr["description"]["partial_text"]
                for attr in expl["explanations"][0]["attributions"]
            ]
        )
        for expl in shap_explanations
    ]
    return attributions_dataset, tokens_dataset


attributions_dataset, tokens_dataset = create_visualization_datasets(shap_output)

Let’s take a look at the first instance in the attributions_dataset and tokens_dataset. We see that they are lists of the same length, as each attribution corresponds to one sentence level token.

[30]:

print(
    f"length of attributions dataset: {len(attributions_dataset[0])}, length of tokens dataset: {len(tokens_dataset[0])}\n"
)

print(attributions_dataset[0])
print(tokens_dataset[0])

length of attributions dataset: 8, length of tokens dataset: 8

[0.0185648  0.02543706 0.00690582 0.02304981 0.02473571 0.02565712
 0.00126668 0.0192283 ]
['I caught a sneak peak of this beautiful dress on a local retailer instagram page...and i was so excited when it arrived at my store.'
 'i love this dress!'
 'i went with the black because i loved how bold it was.'
 "it's ultra feminine and flowy."
 'the slip underneath has the prettiest embroidered print and the overlay is light and airy.'
 'the bottom is hemmed with a little lace peekaboo and it is wonderful.'
 'i tried this on in both a l and an xl.'
 'the xl fit well but the dress was a little long on top, so i wen']

We obtain predictions as well so that they can be displayed alongside the feature attributions.

[31]:

predictions = predictor.predict([t for t in df_test_clarify.values])

The below method is used to produce visualizations for the local explanations. It will visualize the attributions for the tokens with red or green colors for negative and positive attributions.

[37]:

def visualization_record(
    attributions: List,  # list of attributions for the tokens
    text: List,  # list of tokens
    pred: float,  # the prediction value obtained from the endpoint
    delta: float,
    true_label: int,  # the true label from the dataset
    normalize: bool = True,  # normalizes the attributions so that the max absolute value is 1. Yields stronger colors.
    max_frac_to_show: float = 0.05,  # what fraction of tokens to highlight, set to 1 for all.
    match_to_pred: bool = False,  # whether to limit highlights to red for negative predictions and green for positive ones.
    # By enabling `match_to_pred` you show what tokens contribute to a high/low prediction not those that oppose it.
) -> visualization.VisualizationDataRecord:
    if normalize:
        attributions = attributions / max(max(attributions), max(-attributions))
    if max_frac_to_show is not None and max_frac_to_show < 1:
        num_show = int(max_frac_to_show * attributions.shape[0])
        sal = attributions
        if pred < 0.5:
            sal = -sal
        if not match_to_pred:
            sal = np.abs(sal)
        top_idxs = np.argsort(-sal)[:num_show]
        mask = np.zeros_like(attributions)
        mask[top_idxs] = 1
        attributions = attributions * mask
    return visualization.VisualizationDataRecord(
        attributions,
        pred,
        int(pred > 0.5),
        true_label,
        attributions.sum() > 0,
        attributions.sum(),
        text,
        delta,
    )


def visualization_config(
    attributions_dataset: List[List],
    tokens_dataset: List[List],
    test_dataset: Dataset,
    predictions: List,
) -> List[visualization.VisualizationDataRecord]:
    # You can customize the following display settings
    normalize = True
    max_frac_to_show = 1
    match_to_pred = False
    labels = test_dataset["Sentiment"][:num_examples]
    vis = []

    for attr, token, pred, label in zip(attributions_dataset, tokens_dataset, predictions, labels):
        vis.append(
            visualization_record(
                attr, token, float(pred[0]), 0.0, label, normalize, max_frac_to_show, match_to_pred
            )
        )
    return vis


vis = visualization_config(attributions_dataset, tokens_dataset, val_dataset, predictions)

Now that we compiled the record we are ready to render the visualizations.

We see a row per review in the selected dataset. For each row we have the prediction, the label, and the highlighted text. Additionally, we show the total sum of attributions (as attribution score) and its label (as attribution label), which indicates whether it is greater than zero.

[38]:

_ = visualization.visualize_text(vis)

Legend: Negative Neutral Positive

True Label	Predicted Label	Attribution Label	Attribution Score	Word Importance
1	1 (1.00)	True	5.65	I caught a sneak peak of this beautiful dress on a local retailer instagram page...and i was so excited when it arrived at my store. i love this dress! i went with the black because i loved how bold it was. it's ultra feminine and flowy. the slip underneath has the prettiest embroidered print and the overlay is light and airy. the bottom is hemmed with a little lace peekaboo and it is wonderful. i tried this on in both a l and an xl. the xl fit well but the dress was a little long on top, so i wen
1	1 (0.66)	False	-0.36	Was hopeful this might work on my petite pear shape (i am 5', probably 135-140lb; one to two sizes larger on the bottom) when i first saw online so went to store to give the regular size a trial run before ordering the petite size online. you can never tell with these flare dresses - some just make the hips look so much worse and haven't had a lot of luck with casual retailer flare dresses this season. first tried on the medium and fell in love instantly - was looking a little big but thought orde
1	1 (0.99)	True	0.73	I went to retailerplpogie today to try this piece on and it was adorable. i really liked the lace part that lightly shows off the legs. this does run true to size. i am 5'5 and 116 pounds and wear a 34b and i was able to button up everything and be comfortable in a size 0. there is no stretch at all with the materials that they have used, so that might be an issue for some. the material was actually a bit thicker than what i would have thought for dress like this, but i really liked it overall and
1	1 (0.84)	False	-0.03	Like other reviewers, i tried on a medium and a large as i usually take either size in retailer dresses. the large was huge on me - i had an extra 6 inches or so on each side. the next medium that i tried on could barely fit over my head! i looked at the tag to see if i had picked up an extra small but it stated medium. got another medium and it fit perfectly. however, the cami underneath one medium was again teeny tiny. the other medium was ok. you may have to try on a few dresses to get the righ
1	1 (0.99)	True	2.35	This one is a beauty! my store is not carrying the white - they had a lovely blue (periwinkle-ish) that immediately caught my eye. the ivory model shot makes the details of the lace and top layer really pop - more so than the blue variety but both are stunning. the details and layers on this dress are lovely and intricate without feeling overly delicate. quality most definitely in line with the price. fit: 130lbs/34c/5'6" high waist. i'm a 4 or 6 at retailer depending on the brand. in most mou
1	1 (1.00)	True	0.82	This is no exception to the rule! i love turtlenecks and this one is going to be a favorite. do know that this is not lightweight-even though this is sleeveless it is definitely a substantial knit. the color is a beautiful nutmeg and can be dressed up or down. i purchased a gorgeous neutral beaded necklace to go with it and it is simply stunning! it does 'bell out' a little at the bottom which, in my opinion-is the only design flaw. i normally wear a medium in most retailer tops and the small is a
1	1 (0.99)	True	3.75	I picked this up at my nearest retailer this afternoon after placing my order a few days ago. i purchased the off-white version in size small. i actually tried this on in the dressing room to make sure i was satisfied with the product. when i came out, i asked the retailer customer service rep if she could cut the tag off so that i could wear it out today. she not only helped me out but also commented on how much she liked this sweater while also wondering why she hasn't seen it in store; i informed
0	1 (0.99)	True	3.66	I saw this blouse referenced online a few times and have a weakness for white blouses, so off i went to my local retailer. this blouse turned out to be delightful! my thoughts: i would say it fits true to size, as a 0 fits me perfectly in the shoulders. it is a flowy tunic blouse that has a high skit, so it's not possible to wear this as a dress (it is very sheer too). i do think this blouse is designed for the line to hit below the bust, so my incredibly flat chest works in this. if you have even
1	1 (1.00)	True	3.35	I love these pants. i vary between 10 & 12, the 10 fit perfectly since they are designed on the fuller size, perfect for my body type of small waist and a bit of bottom. so many colors to coordinate with these pants. teal green, "retailer" yellow, black...i am not fond of ankle pants, prefer longer pants. i took the cuffs down and lengthened the pants. the only problem is that these pants are so exquisitely made, removing the cuff tacks was a challenge! perfect pants for those of us who have shape
1	1 (0.99)	True	0.86	I saw this sweater and just about died. i loved the accentuated shoulders and beautiful knit detail on the sleeves. lucky for me, retailer day started a couple of days after this sweater debuted, and i snagged it right away. i just got it and am in love. it fits true to size and the shoulders and sleeves are everything i hoped they would be. the neckline in very flattering as well. it is soft and the color is gorgeous. if there is a negative, it would be that it is a bit boxy in the torso, and doe

Token level explainability

So far we looked at sentence level explainability, now let’s look at token level explainability by updating the TextConfig and setting granularity to “token”, and updating the SHAPConfig. Let’s also update the DataConfig to save the outputs to a different path.

[39]:

token_text_config = clarify.TextConfig(
    granularity="token",
    language="english",
)

token_shap_config = clarify.SHAPConfig(
    baseline=[["<UNK>"]],
    num_samples=1000,
    agg_method="mean_abs",
    save_local_shap_values=True,
    text_config=token_text_config,
)

explainability_output_path_token = f"{s3_key}/clarify-text-explainability-token"

token_explainability_data_config = clarify.DataConfig(
    s3_data_input_path=data_file_path,
    s3_output_path=explainability_output_path_token,
    headers=["Review Text"],
    dataset_type="text/csv",
)

The analysis below takes around 20 minutes to complete.

[ ]:

clarify_processor.run_explainability(
    data_config=token_explainability_data_config,
    model_config=model_config,
    explainability_config=token_shap_config,
)

Let’s visualize the local explanations as we did for the sentence level explanations.

In the visualizations below, we see how individual tokens are colored as “positive” or “negative” sentiment.

[41]:

token_shap_output = load_local_explanations(explainability_output_path_token)
attributions_dataset, tokens_dataset = create_visualization_datasets(token_shap_output)
token_vis = visualization_config(attributions_dataset, tokens_dataset, val_dataset, predictions)

_ = visualization.visualize_text(token_vis)

Legend: Negative Neutral Positive

True Label	Predicted Label	Attribution Label	Attribution Score	Word Importance
1	1 (1.00)	True	5.05	I caught a sneak peak of this beautiful dress on a local retailer instagram page ... and i was so excited when it arrived at my store . i love this dress ! i went with the black because i loved how bold it was . it 's ultra feminine and flowy . the slip underneath has the prettiest embroidered print and the overlay is light and airy . the bottom is hemmed with a little lace peekaboo and it is wonderful . i tried this on in both a l and an xl . the xl fit well but the dress was a little long on top , so i wen
1	1 (0.66)	False	-3.33	Was hopeful this might work on my petite pear shape ( i am 5 ' , probably 135 - 140 lb ; one to two sizes larger on the bottom ) when i first saw online so went to store to give the regular size a trial run before ordering the petite size online . you can never tell with these flare dresses - some just make the hips look so much worse and have n't had a lot of luck with casual retailer flare dresses this season . first tried on the medium and fell in love instantly - was looking a little big but thought orde
1	1 (0.99)	True	5.99	I went to retailerplpogie today to try this piece on and it was adorable . i really liked the lace part that lightly shows off the legs . this does run true to size . i am 5'5 and 116 pounds and wear a 34b and i was able to button up everything and be comfortable in a size 0 . there is no stretch at all with the materials that they have used , so that might be an issue for some . the material was actually a bit thicker than what i would have thought for dress like this , but i really liked it overall and
1	1 (0.84)	False	-0.37	Like other reviewers , i tried on a medium and a large as i usually take either size in retailer dresses . the large was huge on me - i had an extra 6 inches or so on each side . the next medium that i tried on could barely fit over my head ! i looked at the tag to see if i had picked up an extra small but it stated medium . got another medium and it fit perfectly . however , the cami underneath one medium was again teeny tiny . the other medium was ok . you may have to try on a few dresses to get the righ
1	1 (0.99)	True	1.50	This one is a beauty ! my store is not carrying the white - they had a lovely blue ( periwinkle - ish ) that immediately caught my eye . the ivory model shot makes the details of the lace and top layer really pop - more so than the blue variety but both are stunning . the details and layers on this dress are lovely and intricate without feeling overly delicate . quality most definitely in line with the price . fit : 130lbs/34c/5'6 " high waist . i 'm a 4 or 6 at retailer depending on the brand . in most mou
1	1 (1.00)	True	1.96	This is no exception to the rule ! i love turtlenecks and this one is going to be a favorite . do know that this is not lightweight - even though this is sleeveless it is definitely a substantial knit . the color is a beautiful nutmeg and can be dressed up or down . i purchased a gorgeous neutral beaded necklace to go with it and it is simply stunning ! it does ' bell out ' a little at the bottom which , in my opinion - is the only design flaw . i normally wear a medium in most retailer tops and the small is a
1	1 (0.99)	True	9.75	I picked this up at my nearest retailer this afternoon after placing my order a few days ago . i purchased the off - white version in size small . i actually tried this on in the dressing room to make sure i was satisfied with the product . when i came out , i asked the retailer customer service rep if she could cut the tag off so that i could wear it out today . she not only helped me out but also commented on how much she liked this sweater while also wondering why she has n't seen it in store ; i informed
0	1 (0.99)	True	1.63	I saw this blouse referenced online a few times and have a weakness for white blouses , so off i went to my local retailer . this blouse turned out to be delightful ! my thoughts : i would say it fits true to size , as a 0 fits me perfectly in the shoulders . it is a flowy tunic blouse that has a high skit , so it 's not possible to wear this as a dress ( it is very sheer too ) . i do think this blouse is designed for the line to hit below the bust , so my incredibly flat chest works in this . if you have even
1	1 (1.00)	True	4.73	I love these pants . i vary between 10 & 12 , the 10 fit perfectly since they are designed on the fuller size , perfect for my body type of small waist and a bit of bottom . so many colors to coordinate with these pants . teal green , " retailer " yellow , black ... i am not fond of ankle pants , prefer longer pants . i took the cuffs down and lengthened the pants . the only problem is that these pants are so exquisitely made , removing the cuff tacks was a challenge ! perfect pants for those of us who have shape
1	1 (0.99)	True	1.82	I saw this sweater and just about died . i loved the accentuated shoulders and beautiful knit detail on the sleeves . lucky for me , retailer day started a couple of days after this sweater debuted , and i snagged it right away . i just got it and am in love . it fits true to size and the shoulders and sleeves are everything i hoped they would be . the neckline in very flattering as well . it is soft and the color is gorgeous . if there is a negative , it would be that it is a bit boxy in the torso , and doe

Cleanup

Finally, please remember to delete the Amazon SageMaker endpoint to avoid charges:

[ ]:

predictor.delete_endpoint()

Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.