R Serving with FastAPI

This notebook’s CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

Dockerfile

The dockerfile defines the environment in which our server will be executed.
Below, you can see that the entrypoint for our container will be deploy.R

[ ]:

%pycat Dockerfile

Code: deploy.R

deploy.R handles the following steps * Loads the R libraries used by the server. * Loads a pretrained xgboost model that has been trained on the classical Iris dataset. * Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. * Defines an inference function that takes a matrix of iris features and returns predictions for those iris examples. * Wraps the inference function to make it thread-safe for passing to python through reticulate. * Finally, it generates the endpoints.py from python and launches the FastAPI server app using those endpoint definitions.

[ ]:

%pycat deploy.R

Code: endpoints.py

endpoints.py defines two routes: * /ping returns a status of ‘Alive’ to indicate that the application is healthy * /invocations applies the previously defined inference function to the input features from the request body

Note, that FastAPI is typed. The Example class define the type of the input that we expect to receive from the request.

For more information about the requirements for building your own inference container, see: Use Your Own Inference Code with Hosting Services

[ ]:

%pycat endpoints.py

Build the Serving Image

[ ]:

!docker build -t r-fastapi .

Launch the Serving Container

[ ]:

!echo "Launching FastAPI"
!docker run -d --rm -p 5000:8080 r-fastapi
!echo "Waiting for the server to start.." && sleep 10

[ ]:

!docker container list

Define Simple Python Client

[ ]:

import requests
from tqdm import tqdm
import pandas as pd

pd.set_option("display.max_rows", 500)

[ ]:

def get_predictions(examples, instance=requests, port=5000):
    payload = {"features": examples}
    return instance.post(f"http://127.0.0.1:{port}/invocations", json=payload)

[ ]:

def get_health(instance=requests, port=5000):
    instance.get(f"http://127.0.0.1:{port}/ping")

Define Example Inputs

We define example inputs from the Iris dataset.

[ ]:

column_names = ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Label"]
iris = pd.read_csv(
    "s3://sagemaker-sample-files/datasets/tabular/iris/iris.data", names=column_names
)

[ ]:

iris_features = iris[["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]]

[ ]:

example_inputs = iris_features.values.tolist()

Plumber

[ ]:

predicted = get_predictions(example_inputs).json()["output"]

[ ]:

iris["predicted"] = predicted

[ ]:

iris

Stop All Serving Containers

Finally, we will shut down the serving container we launched for the test.

[ ]:

!docker kill $(docker ps -q)

Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.