Deploy a pretrained PyTorch BERT model from Hugging Face Hub on Amazon SageMaker for sentiment analysis
This notebook’s CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.
Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.
BERT was trained on BookCorpus
and English Wikipedia data, which contain 800 million words and 2,500 million words, respectively. Training BERT from scratch would be prohibitively expensive. By taking advantage of transfer learning, one can quickly fine tune BERT for another use case with a relatively small amount of training data to achieve state-of-the-art results for common NLP tasks, such as text classification and question answering.
Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models. The SageMaker Python SDK provides open source APIs and containers that make it easy to train and deploy models in Amazon SageMaker with several machine learning and deep learning frameworks.
Our customers often ask for quick fine-tuning and easy deployment of their NLP models.
In this notebook, you will deploy a pretrained PyTorch BERT model from Hugging Face Hub on Amazon SageMaker for sentiment analysis.
You’ll execute the following steps: - Initiate a `Huggingface pipeline
<https://huggingface.co/transformers/main_classes/pipelines.html>`__ and save the model and config on the local file system. - Tar GZIP the model and config files, and upload model.tar.gz
to a S3
bucket. - Deploy the model to a SageMaker Endpoint and make few inference requests. - Optional cleanup.
Install Python packages
If you run this notebook in SageMaker Studio, you need to make sure ipywidgets
is installed and restart the kernel, so please uncomment the code in the next cell, and run it.
[ ]:
# %%capture
# import IPython
# import sys
# !{sys.executable} -m pip install ipywidgets
# IPython.Application.instance().kernel.do_shutdown(True) # has to restart kernel so changes are used
Then you’ll install Transformers
, a state-of-the-art Natural Language Processing for Jax
, Pytorch
and TensorFlow
.
Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT
, GPT-2
, RoBERTa
, XLM
, DistilBert
, XLNet
) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax
, PyTorch
and TensorFlow
.
[ ]:
import sys
!{sys.executable} -m pip install Transformers
Let’s start by creating a SageMaker session and specifying:
The S3 bucket and prefix that you want to use for the model data. This should be within the same region as the Notebook Instance, training, and hosting.
The IAM role arn used to give hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the
sagemaker.get_execution_role()
with the appropriate full IAM role arn string(s).
[ ]:
import os
import boto3
import sagemaker
role = sagemaker.get_execution_role()
sess = sagemaker.Session()
bucket = sess.default_bucket()
prefix = "sagemaker/pytorch-bert-sentiment-analysis"
Initiate a Huggingface pipeline
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the task summary for examples of use.
[ ]:
from transformers import pipeline
sentiment_analysis = pipeline("sentiment-analysis")
Save the pre-trained model on file system
[ ]:
sentiment_analysis.save_pretrained("./model")
Package the pre-trained model and upload it to S3
No you can see that there is a pretrained BERT model under model
directory by listing the files in it.
[ ]:
!ls -rtlh ./model/
Now you’ll create a model.tar.gz
file to be used by SageMaker endpoint
[ ]:
!cd model && tar czvf ../model.tar.gz *
Upload the model.tar.gz
to the bucket in S3 you previously set up.
[ ]:
fObj = open("model.tar.gz", "rb")
key = os.path.join(prefix, "model.tar.gz")
boto3.Session().resource("s3").Bucket(bucket).Object(key).upload_fileobj(fObj)
print(os.path.join(bucket, key))
[ ]:
pretrained_model_data = "s3://{}/{}".format(bucket, key)
pretrained_model_data
Write the Inference Script
To deploy a pretrained PyTorch
model, you’ll need to use the PyTorch
estimator object to create a PyTorchModel
object and set a different entry_point
.
You’ll use the PyTorchModel
object to deploy a PyTorchPredictor
. This creates a SageMaker
Endpoint – a hosted prediction service that we can use to perform inference.
An implementation of model_fn
is required for inference script. We are going to use default implementations of input_fn
, predict_fn
, output_fn
and model_fn
defined in sagemaker-pytorch-containers.
Here’s an example of the inference script:
[ ]:
!pygmentize code/inference.py
Create a model object
You define the model object by using the SageMaker Python SDK’s PyTorchModel
and pass in the model from the estimator
and the entry_point
. The endpoint’s entry point for inference is defined by model_fn
as seen in the following code block that prints out inference.py
. The function loads the model and sets it to use a GPU, if available.
[ ]:
from sagemaker.pytorch.model import PyTorchModel
pytorch_model = PyTorchModel(
model_data=pretrained_model_data,
role=role,
framework_version="1.7.1",
source_dir="code",
py_version="py3",
entry_point="inference.py",
)
Deploy the model in SageMaker endpoint
The arguments to the deploy
function allow us to set the number and type of instances that will be used for the Endpoint.
Here you will deploy the model to a single ml.m5.large
instance.
[ ]:
predictor = pytorch_model.deploy(initial_instance_count=1, instance_type="ml.m5.large")
Since in the input_fn
we declared that the incoming requests are json-encoded, we need to use a json serializer
, To encode the incoming data into a json string. Also, we declared the return content type to be json string, we Need to use a json deserializer
to parse the response.
[ ]:
predictor.serializer = sagemaker.serializers.JSONSerializer()
predictor.deserializer = sagemaker.deserializers.JSONDeserializer()
Test the model
Using few samples, you can now invoke the SageMaker endpoint to get predictions.
[ ]:
result = predictor.predict("Never allow the same bug to bite you twice.")
result
[ ]:
result = predictor.predict(
"The best part of Amazon SageMaker is that it makes machine learning easy."
)
result
You can also invoke the endpoint with a list of sentences
[ ]:
result = predictor.predict(
[
"Never allow the same bug to bite you twice.",
"The best part of Amazon SageMaker is that it makes machine learning easy.",
]
)
result
Clean up
Endpoints should be deleted when no longer in use, since (per the SageMaker pricing page) they’re billed by time deployed.
[ ]:
predictor.delete_endpoint(predictor.endpoint)
Notebook CI Test Results
This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.