Iris Training and Prediction with Sagemaker Scikit-learn

This tutorial shows you how to use Scikit-learn with Sagemaker by utilizing the pre-built container. Scikit-learn is a popular Python machine learning framework. It includes a number of different algorithms for classification, regression, clustering, dimensionality reduction, and data/feature pre-processing.

The sagemaker-python-sdk module makes it easy to take existing scikit-learn code, which we will show by training a model on the IRIS dataset and generating a set of predictions. For more information about the Scikit-learn container, see the sagemaker-scikit-learn-containers repository and the sagemaker-python-sdk repository.

For more on Scikit-learn, please visit the Scikit-learn website: http://scikit-learn.org/stable/.

Table of contents

First, lets create our Sagemaker session and role, and create a S3 prefix to use for the notebook example.

This notebook has been tested using the Python 3 (Data Science) kernel

[2]:
# S3 prefix
prefix = "Scikit-iris"

import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()

Upload the data for training

When training large models with huge amounts of data, you’ll typically use big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3. For the purposes of this example, we’re using a sample of the classic Iris dataset, which is included with Scikit-learn. We will load the dataset, write locally, then write the dataset to s3 to use.

[3]:
import numpy as np
import os
from sklearn import datasets

# Load Iris dataset, then join labels and features
iris = datasets.load_iris()
joined_iris = np.insert(iris.data, 0, iris.target, axis=1)

# Create directory and write csv
os.makedirs("./data", exist_ok=True)
np.savetxt("./data/iris.csv", joined_iris, delimiter=",", fmt="%1.1f, %1.3f, %1.3f, %1.3f, %1.3f")

Once we have the data locally, we can use use the tools provided by the SageMaker Python SDK to upload the data to a default bucket.

[4]:
WORK_DIRECTORY = "data"

train_input = sagemaker_session.upload_data(
    WORK_DIRECTORY, key_prefix="{}/{}".format(prefix, WORK_DIRECTORY)
)

Create a Scikit-learn script to train with

SageMaker can now run a scikit-learn script using the SKLearn estimator. When executed on SageMaker a number of helpful environment variables are available to access properties of the training environment, such as:

  • SM_MODEL_DIR: A string representing the path to the directory to write model artifacts to. Any artifacts saved in this folder are uploaded to S3 for model hosting after the training job completes.

  • SM_OUTPUT_DIR: A string representing the filesystem path to write output artifacts to. Output artifacts may include checkpoints, graphs, and other files to save, not including model artifacts. These artifacts are compressed and uploaded to S3 to the same S3 prefix as the model artifacts.

Supposing two input channels, ‘train’ and ‘test’, were used in the call to the SKLearn estimator’s fit() method, the following environment variables will be set, following the format SM_CHANNEL_[channel_name]:

  • SM_CHANNEL_TRAIN: A string representing the path to the directory containing data in the ‘train’ channel

  • SM_CHANNEL_TEST: Same as above, but for the ‘test’ channel.

A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to model_dir so that it can be hosted later. Hyperparameters are passed to your script as arguments and can be retrieved with an argparse.ArgumentParser instance. For example, the script that we will run in this notebook is the below:

import argparse
import pandas as pd
import os

from sklearn import tree
import joblib


if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    # Hyperparameters are described here. In this simple example we are just including one hyperparameter.
    parser.add_argument('--max_leaf_nodes', type=int, default=-1)

    # Sagemaker specific arguments. Defaults are set in the environment variables.
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])

    args = parser.parse_args()

    # Take the set of files and read them all into a single pandas dataframe
    input_files = [ os.path.join(args.train, file) for file in os.listdir(args.train) ]
    if len(input_files) == 0:
        raise ValueError(('There are no files in {}.\n' +
                          'This usually indicates that the channel ({}) was incorrectly specified,\n' +
                          'the data specification in S3 was incorrectly specified or the role specified\n' +
                          'does not have permission to access the data.').format(args.train, "train"))
    raw_data = [ pd.read_csv(file, header=None, engine="python") for file in input_files ]
    train_data = pd.concat(raw_data)

    # labels are in the first column
    train_y = train_data.loc[:,0]
    train_X = train_data.loc[:,1:]

    # Here we support a single hyperparameter, 'max_leaf_nodes'. Note that you can add as many
    # as your training my require in the ArgumentParser above.
    max_leaf_nodes = args.max_leaf_nodes

    # Now use scikit-learn's decision tree classifier to train the model.
    clf = tree.DecisionTreeClassifier(max_leaf_nodes=max_leaf_nodes)
    clf = clf.fit(train_X, train_y)

    # Print the coefficients of the trained classifier, and save the coefficients
    joblib.dump(clf, os.path.join(args.model_dir, "model.joblib"))


def model_fn(model_dir):
    """Deserialized and return fitted model

    Note that this should have the same name as the serialized model in the main method
    """
    clf = joblib.load(os.path.join(model_dir, "model.joblib"))
    return clf

Because the Scikit-learn container imports your training script, you should always put your training code in a main guard (if __name__=='__main__':) so that the container does not inadvertently run your training code at the wrong point in execution.

For more information about training environment variables, please visit https://github.com/aws/sagemaker-containers.

Create SageMaker Scikit Estimator

To run our Scikit-learn training script on SageMaker, we construct a sagemaker.sklearn.estimator.sklearn estimator, which accepts several constructor arguments:

  • entry_point: The path to the Python script SageMaker runs for training and prediction.

  • role: Role ARN

  • train_instance_type (optional): The type of SageMaker instances for training. Note: Because Scikit-learn does not natively support GPU training, Sagemaker Scikit-learn does not currently support training on GPU instance types.

  • sagemaker_session (optional): The session used to train on Sagemaker.

  • hyperparameters (optional): A dictionary passed to the train function as hyperparameters.

To see the code for the SKLearn Estimator, see here: https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/sklearn

[5]:
from sagemaker.sklearn.estimator import SKLearn

script_path = "scikit_learn_iris.py"

sklearn = SKLearn(
    entry_point=script_path,
    instance_type="ml.c5.xlarge",
    role=role,
    framework_version='1.0-1',
    py_version='py3',
    sagemaker_session=sagemaker_session,
    hyperparameters={"max_leaf_nodes": 10},
)

Train SKLearn Estimator on Iris data

Training is very simple, just call fit on the Estimator! This will start a SageMaker Training job that will download the data for us, invoke our scikit-learn code (in the provided script file), and save any model artifacts that the script creates.

[6]:
sklearn.fit({"train": train_input})
2021-06-03 21:55:17 Starting - Starting the training job...
2021-06-03 21:55:41 Starting - Launching requested ML instancesProfilerReport-1622757317: InProgress
......
2021-06-03 21:56:41 Starting - Preparing the instances for training......
2021-06-03 21:57:41 Downloading - Downloading input data...
2021-06-03 21:58:17 Training - Training image download completed. Training in progress.
2021-06-03 21:58:17 Uploading - Uploading generated training model
2021-06-03 21:58:17 Completed - Training job completed
2021-06-03 21:58:04,864 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
2021-06-03 21:58:04,869 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2021-06-03 21:58:04,880 sagemaker_sklearn_container.training INFO     Invoking user training script.
2021-06-03 21:58:05,216 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2021-06-03 21:58:08,294 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2021-06-03 21:58:08,304 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2021-06-03 21:58:08,312 sagemaker-training-toolkit INFO     Invoking user script

Training Env:

{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train"
    },
    "current_host": "algo-1",
    "framework_module": "sagemaker_sklearn_container.training:main",
    "hosts": [
        "algo-1"
    ],
    "hyperparameters": {
        "max_leaf_nodes": 10
    },
    "input_config_dir": "/opt/ml/input/config",
    "input_data_config": {
        "train": {
            "TrainingInputMode": "File",
            "S3DistributionType": "FullyReplicated",
            "RecordWrapperType": "None"
        }
    },
    "input_dir": "/opt/ml/input",
    "is_master": true,
    "job_name": "sagemaker-scikit-learn-2021-06-03-21-55-17-520",
    "log_level": 20,
    "master_hostname": "algo-1",
    "model_dir": "/opt/ml/model",
    "module_dir": "s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-21-55-17-520/source/sourcedir.tar.gz",
    "module_name": "scikit_learn_iris",
    "network_interface_name": "eth0",
    "num_cpus": 4,
    "num_gpus": 0,
    "output_data_dir": "/opt/ml/output/data",
    "output_dir": "/opt/ml/output",
    "output_intermediate_dir": "/opt/ml/output/intermediate",
    "resource_config": {
        "current_host": "algo-1",
        "hosts": [
            "algo-1"
        ],
        "network_interface_name": "eth0"
    },
    "user_entry_point": "scikit_learn_iris.py"
}

Environment variables:

SM_HOSTS=["algo-1"]
SM_NETWORK_INTERFACE_NAME=eth0
SM_HPS={"max_leaf_nodes":10}
SM_USER_ENTRY_POINT=scikit_learn_iris.py
SM_FRAMEWORK_PARAMS={}
SM_RESOURCE_CONFIG={"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"eth0"}
SM_INPUT_DATA_CONFIG={"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}}
SM_OUTPUT_DATA_DIR=/opt/ml/output/data
SM_CHANNELS=["train"]
SM_CURRENT_HOST=algo-1
SM_MODULE_NAME=scikit_learn_iris
SM_LOG_LEVEL=20
SM_FRAMEWORK_MODULE=sagemaker_sklearn_container.training:main
SM_INPUT_DIR=/opt/ml/input
SM_INPUT_CONFIG_DIR=/opt/ml/input/config
SM_OUTPUT_DIR=/opt/ml/output
SM_NUM_CPUS=4
SM_NUM_GPUS=0
SM_MODEL_DIR=/opt/ml/model
SM_MODULE_DIR=s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-21-55-17-520/source/sourcedir.tar.gz
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"train":"/opt/ml/input/data/train"},"current_host":"algo-1","framework_module":"sagemaker_sklearn_container.training:main","hosts":["algo-1"],"hyperparameters":{"max_leaf_nodes":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"sagemaker-scikit-learn-2021-06-03-21-55-17-520","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-21-55-17-520/source/sourcedir.tar.gz","module_name":"scikit_learn_iris","network_interface_name":"eth0","num_cpus":4,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"eth0"},"user_entry_point":"scikit_learn_iris.py"}
SM_USER_ARGS=["--max_leaf_nodes","10"]
SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
SM_CHANNEL_TRAIN=/opt/ml/input/data/train
SM_HP_MAX_LEAF_NODES=10
PYTHONPATH=/opt/ml/code:/miniconda3/bin:/miniconda3/lib/python37.zip:/miniconda3/lib/python3.7:/miniconda3/lib/python3.7/lib-dynload:/miniconda3/lib/python3.7/site-packages

Invoking script with the following command:

/miniconda3/bin/python scikit_learn_iris.py --max_leaf_nodes 10


2021-06-03 21:58:09,519 sagemaker-containers INFO     Reporting training SUCCESS
Training seconds: 45
Billable seconds: 45

Using the trained model to make inference requests

Deploy the model

Deploying the model to SageMaker hosting just requires a deploy call on the fitted model. This call takes an instance count and instance type.

[7]:
predictor = sklearn.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")
-------!

Choose some data and use it for a prediction

In order to do some predictions, we’ll extract some of the data we used for training and do predictions against it. This is, of course, bad statistical practice, but a good way to see how the mechanism works.

[8]:
import itertools
import pandas as pd

shape = pd.read_csv("data/iris.csv", header=None)

a = [50 * i for i in range(3)]
b = [40 + i for i in range(10)]
indices = [i + j for i, j in itertools.product(a, b)]

test_data = shape.iloc[indices[:-1]]
test_X = test_data.iloc[:, 1:]
test_y = test_data.iloc[:, 0]

Prediction is as easy as calling predict with the predictor we got back from deploy and the data we want to do predictions with. The output from the endpoint return an numerical representation of the classification prediction; in the original dataset, these are flower names, but in this example the labels are numerical. We can compare against the original label that we parsed.

[20]:
print(predictor.predict(test_X.values))
print(test_y.values)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 2. 2.
 2. 2. 2. 2. 2.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 2. 2.
 2. 2. 2. 2. 2.]

Endpoint cleanup

When you’re done with the endpoint, you’ll want to clean it up.

[21]:
sagemaker_session.delete_endpoint(
    endpoint_name=predictor.endpoint_name
)

Batch Transform

We can also use the trained model for asynchronous batch inference on S3 data using SageMaker Batch Transform.

[11]:
# Define a SKLearn Transformer from the trained SKLearn Estimator
transformer = sklearn.transformer(instance_count=1, instance_type="ml.m4.xlarge")

Prepare Input Data

We will extract 10 random samples of 100 rows from the training data, then split the features (X) from the labels (Y). Then upload the input data to a given location in S3.

[12]:
%%bash
# Randomly sample the iris dataset 10 times, then split X and Y
mkdir -p batch_data/XY batch_data/X batch_data/Y
for i in {0..9}; do
    cat data/iris.csv | shuf -n 100 > batch_data/XY/iris_sample_${i}.csv
    cat batch_data/XY/iris_sample_${i}.csv | cut -d',' -f2- > batch_data/X/iris_sample_X_${i}.csv
    cat batch_data/XY/iris_sample_${i}.csv | cut -d',' -f1 > batch_data/Y/iris_sample_Y_${i}.csv
done
[13]:
# Upload input data from local filesystem to S3
batch_input_s3 = sagemaker_session.upload_data("batch_data/X", key_prefix=prefix + "/batch_input")

Run Transform Job

Using the Transformer, run a transform job on the S3 input data.

[14]:
# Start a transform job and wait for it to finish
transformer.transform(batch_input_s3, content_type="text/csv")
print("Waiting for transform job: " + transformer.latest_transform_job.job_name)
transformer.wait()
.................................
2021-06-03 22:07:17,576 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2021-06-03 22:07:17,579 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2021-06-03 22:07:17,580 INFO - sagemaker-containers - nginx config: 
worker_processes auto;
daemon off;
pid /tmp/nginx.pid;
error_log  /dev/stderr;

worker_rlimit_nofile 4096;

events {
  worker_connections 2048;
}

2021-06-03 22:07:17,576 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2021-06-03 22:07:17,579 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2021-06-03 22:07:17,580 INFO - sagemaker-containers - nginx config: 
worker_processes auto;
daemon off;
pid /tmp/nginx.pid;
error_log  /dev/stderr;

worker_rlimit_nofile 4096;

events {
  worker_connections 2048;
}

http {
  include /etc/nginx/mime.types;
  default_type application/octet-stream;
  access_log /dev/stdout combined;

  upstream gunicorn {
    server unix:/tmp/gunicorn.sock;
  }

  server {
    listen 8080 deferred;
    client_max_body_size 0;

    keepalive_timeout 3;

    location ~ ^/(ping|invocations|execution-parameters) {
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_redirect off;
      proxy_read_timeout 60s;
      proxy_pass http://gunicorn;
    }

    location / {
      return 404 "{}";
    }

  }
}


2021-06-03 22:07:17,718 INFO - sagemaker-containers - Module scikit_learn_iris does not provide a setup.py. 
Generating setup.py
2021-06-03 22:07:17,718 INFO - sagemaker-containers - Generating setup.cfg
2021-06-03 22:07:17,719 INFO - sagemaker-containers - Generating MANIFEST.in
2021-06-03 22:07:17,719 INFO - sagemaker-containers - Installing module with the following command:
/miniconda3/bin/python -m pip install . 
Processing /opt/ml/code
Building wheels for collected packages: scikit-learn-iris
  Building wheel for scikit-learn-iris (setup.py): started
  Building wheel for scikit-learn-iris (setup.py): finished with status 'done'
  Created wheel for scikit-learn-iris: filename=scikit_learn_iris-1.0.0-py2.py3-none-any.whl size=5739 sha256=74ab494ae58d82d1408b3806bcff33a1b3e1b2916c3a222c0b1854f5c7ba728d
  Stored in directory: /home/model-server/tmp/pip-ephem-wheel-cache-81gnfsm9/wheels/3e/0f/51/2f1df833dd0412c1bc2f5ee56baac195b5be563353d111dca6
Successfully built scikit-learn-iris
Installing collected packages: scikit-learn-iris
Successfully installed scikit-learn-iris-1.0.0
[2021-06-03 22:07:20 +0000] [36] [INFO] Starting gunicorn 20.0.4
[2021-06-03 22:07:20 +0000] [36] [INFO] Listening at: unix:/tmp/gunicorn.sock (36)
[2021-06-03 22:07:20 +0000] [36] [INFO] Using worker: gevent
[2021-06-03 22:07:20 +0000] [39] [INFO] Booting worker with pid: 39
[2021-06-03 22:07:20 +0000] [40] [INFO] Booting worker with pid: 40
[2021-06-03 22:07:20 +0000] [44] [INFO] Booting worker with pid: 44
[2021-06-03 22:07:20 +0000] [45] [INFO] Booting worker with pid: 45
http {
  include /etc/nginx/mime.types;
  default_type application/octet-stream;
  access_log /dev/stdout combined;

  upstream gunicorn {
    server unix:/tmp/gunicorn.sock;
  }

  server {
    listen 8080 deferred;
    client_max_body_size 0;

    keepalive_timeout 3;

    location ~ ^/(ping|invocations|execution-parameters) {
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_redirect off;
      proxy_read_timeout 60s;
      proxy_pass http://gunicorn;
    }

    location / {
      return 404 "{}";
    }

  }
}


2021-06-03 22:07:17,718 INFO - sagemaker-containers - Module scikit_learn_iris does not provide a setup.py. 
Generating setup.py
2021-06-03 22:07:17,718 INFO - sagemaker-containers - Generating setup.cfg
2021-06-03 22:07:17,719 INFO - sagemaker-containers - Generating MANIFEST.in
2021-06-03 22:07:17,719 INFO - sagemaker-containers - Installing module with the following command:
/miniconda3/bin/python -m pip install . 
Processing /opt/ml/code
Building wheels for collected packages: scikit-learn-iris
  Building wheel for scikit-learn-iris (setup.py): started
  Building wheel for scikit-learn-iris (setup.py): finished with status 'done'
  Created wheel for scikit-learn-iris: filename=scikit_learn_iris-1.0.0-py2.py3-none-any.whl size=5739 sha256=74ab494ae58d82d1408b3806bcff33a1b3e1b2916c3a222c0b1854f5c7ba728d
  Stored in directory: /home/model-server/tmp/pip-ephem-wheel-cache-81gnfsm9/wheels/3e/0f/51/2f1df833dd0412c1bc2f5ee56baac195b5be563353d111dca6
Successfully built scikit-learn-iris
Installing collected packages: scikit-learn-iris
Successfully installed scikit-learn-iris-1.0.0
[2021-06-03 22:07:20 +0000] [36] [INFO] Starting gunicorn 20.0.4
[2021-06-03 22:07:20 +0000] [36] [INFO] Listening at: unix:/tmp/gunicorn.sock (36)
[2021-06-03 22:07:20 +0000] [36] [INFO] Using worker: gevent
[2021-06-03 22:07:20 +0000] [39] [INFO] Booting worker with pid: 39
[2021-06-03 22:07:20 +0000] [40] [INFO] Booting worker with pid: 40
[2021-06-03 22:07:20 +0000] [44] [INFO] Booting worker with pid: 44
[2021-06-03 22:07:20 +0000] [45] [INFO] Booting worker with pid: 45
2021-06-03 22:07:23,506 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2021-06-03 22:07:23,506 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "GET /ping HTTP/1.1" 200 0 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "GET /execution-parameters HTTP/1.1" 404 232 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:24,514 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "GET /ping HTTP/1.1" 200 0 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "GET /execution-parameters HTTP/1.1" 404 232 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:24,514 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:25,211 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:26,012 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:25,211 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:26,012 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03T22:07:24.199:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD
Waiting for transform job: sagemaker-scikit-learn-2021-06-03-22-02-02-816
2021-06-03 22:07:17,576 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2021-06-03 22:07:17,579 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2021-06-03 22:07:17,580 INFO - sagemaker-containers - nginx config: 
worker_processes auto;
daemon off;
pid /tmp/nginx.pid;
error_log  /dev/stderr;

worker_rlimit_nofile 4096;

events {
  worker_connections 2048;
}

2021-06-03 22:07:17,576 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2021-06-03 22:07:17,579 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2021-06-03 22:07:17,580 INFO - sagemaker-containers - nginx config: 
worker_processes auto;
daemon off;
pid /tmp/nginx.pid;
error_log  /dev/stderr;

worker_rlimit_nofile 4096;

events {
  worker_connections 2048;
}

http {
  include /etc/nginx/mime.types;
  default_type application/octet-stream;
  access_log /dev/stdout combined;

  upstream gunicorn {
    server unix:/tmp/gunicorn.sock;
  }

  server {
    listen 8080 deferred;
    client_max_body_size 0;

    keepalive_timeout 3;

    location ~ ^/(ping|invocations|execution-parameters) {
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_redirect off;
      proxy_read_timeout 60s;
      proxy_pass http://gunicorn;
    }

    location / {
      return 404 "{}";
    }

  }
}


2021-06-03 22:07:17,718 INFO - sagemaker-containers - Module scikit_learn_iris does not provide a setup.py. 
Generating setup.py
2021-06-03 22:07:17,718 INFO - sagemaker-containers - Generating setup.cfg
2021-06-03 22:07:17,719 INFO - sagemaker-containers - Generating MANIFEST.in
2021-06-03 22:07:17,719 INFO - sagemaker-containers - Installing module with the following command:
/miniconda3/bin/python -m pip install . 
Processing /opt/ml/code
Building wheels for collected packages: scikit-learn-iris
  Building wheel for scikit-learn-iris (setup.py): started
  Building wheel for scikit-learn-iris (setup.py): finished with status 'done'
  Created wheel for scikit-learn-iris: filename=scikit_learn_iris-1.0.0-py2.py3-none-any.whl size=5739 sha256=74ab494ae58d82d1408b3806bcff33a1b3e1b2916c3a222c0b1854f5c7ba728d
  Stored in directory: /home/model-server/tmp/pip-ephem-wheel-cache-81gnfsm9/wheels/3e/0f/51/2f1df833dd0412c1bc2f5ee56baac195b5be563353d111dca6
Successfully built scikit-learn-iris
Installing collected packages: scikit-learn-iris
Successfully installed scikit-learn-iris-1.0.0
[2021-06-03 22:07:20 +0000] [36] [INFO] Starting gunicorn 20.0.4
[2021-06-03 22:07:20 +0000] [36] [INFO] Listening at: unix:/tmp/gunicorn.sock (36)
[2021-06-03 22:07:20 +0000] [36] [INFO] Using worker: gevent
[2021-06-03 22:07:20 +0000] [39] [INFO] Booting worker with pid: 39
[2021-06-03 22:07:20 +0000] [40] [INFO] Booting worker with pid: 40
[2021-06-03 22:07:20 +0000] [44] [INFO] Booting worker with pid: 44
[2021-06-03 22:07:20 +0000] [45] [INFO] Booting worker with pid: 45
http {
  include /etc/nginx/mime.types;
  default_type application/octet-stream;
  access_log /dev/stdout combined;

  upstream gunicorn {
    server unix:/tmp/gunicorn.sock;
  }

  server {
    listen 8080 deferred;
    client_max_body_size 0;

    keepalive_timeout 3;

    location ~ ^/(ping|invocations|execution-parameters) {
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_redirect off;
      proxy_read_timeout 60s;
      proxy_pass http://gunicorn;
    }

    location / {
      return 404 "{}";
    }

  }
}


2021-06-03 22:07:17,718 INFO - sagemaker-containers - Module scikit_learn_iris does not provide a setup.py. 
Generating setup.py
2021-06-03 22:07:17,718 INFO - sagemaker-containers - Generating setup.cfg
2021-06-03 22:07:17,719 INFO - sagemaker-containers - Generating MANIFEST.in
2021-06-03 22:07:17,719 INFO - sagemaker-containers - Installing module with the following command:
/miniconda3/bin/python -m pip install . 
Processing /opt/ml/code
Building wheels for collected packages: scikit-learn-iris
  Building wheel for scikit-learn-iris (setup.py): started
  Building wheel for scikit-learn-iris (setup.py): finished with status 'done'
  Created wheel for scikit-learn-iris: filename=scikit_learn_iris-1.0.0-py2.py3-none-any.whl size=5739 sha256=74ab494ae58d82d1408b3806bcff33a1b3e1b2916c3a222c0b1854f5c7ba728d
  Stored in directory: /home/model-server/tmp/pip-ephem-wheel-cache-81gnfsm9/wheels/3e/0f/51/2f1df833dd0412c1bc2f5ee56baac195b5be563353d111dca6
Successfully built scikit-learn-iris
Installing collected packages: scikit-learn-iris
Successfully installed scikit-learn-iris-1.0.0
[2021-06-03 22:07:20 +0000] [36] [INFO] Starting gunicorn 20.0.4
[2021-06-03 22:07:20 +0000] [36] [INFO] Listening at: unix:/tmp/gunicorn.sock (36)
[2021-06-03 22:07:20 +0000] [36] [INFO] Using worker: gevent
[2021-06-03 22:07:20 +0000] [39] [INFO] Booting worker with pid: 39
[2021-06-03 22:07:20 +0000] [40] [INFO] Booting worker with pid: 40
[2021-06-03 22:07:20 +0000] [44] [INFO] Booting worker with pid: 44
[2021-06-03 22:07:20 +0000] [45] [INFO] Booting worker with pid: 45
2021-06-03 22:07:23,506 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
2021-06-03 22:07:23,506 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "GET /ping HTTP/1.1" 200 0 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "GET /execution-parameters HTTP/1.1" 404 232 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:24,514 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "GET /ping HTTP/1.1" 200 0 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "GET /execution-parameters HTTP/1.1" 404 232 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:24 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:24,514 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:25,211 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:26,012 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:25,211 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:25 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03 22:07:26,012 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
169.254.255.130 - - [03/Jun/2021:22:07:26 +0000] "POST /invocations HTTP/1.1" 200 500 "-" "Go-http-client/1.1"
2021-06-03T22:07:24.199:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD

Check Output Data

After the transform job has completed, download the output data from S3. For each file “f” in the input data, we have a corresponding file “f.out” containing the predicted labels from each input row. We can compare the predicted labels to the true labels saved earlier.

[15]:
# Download the output data from S3 to local filesystem
batch_output = transformer.output_path
!mkdir -p batch_data/output
!aws s3 cp --recursive $batch_output/ batch_data/output/
# Head to see what the batch output looks like
!head batch_data/output/*
download: s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-22-02-02-816/iris_sample_X_0.csv.out to batch_data/output/iris_sample_X_0.csv.out
download: s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-22-02-02-816/iris_sample_X_1.csv.out to batch_data/output/iris_sample_X_1.csv.out
download: s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-22-02-02-816/iris_sample_X_2.csv.out to batch_data/output/iris_sample_X_2.csv.out
download: s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-22-02-02-816/iris_sample_X_4.csv.out to batch_data/output/iris_sample_X_4.csv.out
download: s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-22-02-02-816/iris_sample_X_6.csv.out to batch_data/output/iris_sample_X_6.csv.out
download: s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-22-02-02-816/iris_sample_X_5.csv.out to batch_data/output/iris_sample_X_5.csv.out
download: s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-22-02-02-816/iris_sample_X_9.csv.out to batch_data/output/iris_sample_X_9.csv.out
download: s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-22-02-02-816/iris_sample_X_7.csv.out to batch_data/output/iris_sample_X_7.csv.out
download: s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-22-02-02-816/iris_sample_X_3.csv.out to batch_data/output/iris_sample_X_3.csv.out
download: s3://sagemaker-us-west-2-688520471316/sagemaker-scikit-learn-2021-06-03-22-02-02-816/iris_sample_X_8.csv.out to batch_data/output/iris_sample_X_8.csv.out
==> batch_data/output/iris_sample_X_0.csv.out <==
[1.0, 1.0, 0.0, 2.0, 1.0, 0.0, 0.0, 2.0, 2.0, 2.0, 0.0, 2.0, 2.0, 1.0, 1.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 1.0, 1.0, 2.0, 2.0, 0.0, 2.0, 1.0, 1.0, 2.0, 0.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 2.0, 2.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 1.0, 2.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 2.0, 1.0, 0.0, 1.0, 0.0, 2.0, 2.0, 1.0, 2.0, 0.0, 2.0, 1.0, 1.0, 0.0, 1.0, 2.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0]
==> batch_data/output/iris_sample_X_1.csv.out <==
[1.0, 0.0, 2.0, 0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 2.0, 1.0, 0.0, 2.0, 2.0, 0.0, 1.0, 2.0, 1.0, 1.0, 0.0, 2.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 2.0, 0.0, 1.0, 0.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 2.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 2.0, 2.0, 1.0, 2.0, 2.0, 0.0, 2.0, 1.0, 2.0, 1.0, 0.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 2.0, 2.0, 0.0, 2.0, 2.0]
==> batch_data/output/iris_sample_X_2.csv.out <==
[1.0, 1.0, 2.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 2.0, 1.0, 1.0, 2.0, 0.0, 2.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 0.0, 2.0, 2.0, 0.0, 1.0, 0.0, 1.0, 2.0, 1.0, 0.0, 1.0, 1.0, 2.0, 2.0, 0.0, 2.0, 2.0, 2.0, 1.0, 2.0, 2.0, 0.0, 0.0, 2.0, 1.0, 2.0, 1.0, 1.0, 2.0, 2.0, 1.0, 2.0, 2.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0]
==> batch_data/output/iris_sample_X_3.csv.out <==
[1.0, 2.0, 2.0, 0.0, 2.0, 2.0, 0.0, 2.0, 2.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 2.0, 1.0, 2.0, 0.0, 1.0, 1.0, 1.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, 0.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 0.0, 2.0, 0.0, 0.0, 1.0, 2.0, 2.0, 0.0, 2.0, 0.0, 1.0, 2.0, 0.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 2.0, 2.0, 0.0, 2.0, 2.0, 0.0, 1.0, 1.0, 1.0, 0.0, 2.0, 2.0, 0.0, 1.0, 2.0, 1.0, 0.0, 2.0, 0.0, 0.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 0.0]
==> batch_data/output/iris_sample_X_4.csv.out <==
[1.0, 2.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 2.0, 1.0, 1.0, 0.0, 1.0, 2.0, 0.0, 2.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 2.0, 1.0, 1.0, 1.0, 2.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 2.0, 0.0, 0.0, 1.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, 1.0, 1.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 1.0, 1.0, 2.0, 2.0, 2.0, 0.0, 2.0, 0.0, 0.0, 2.0, 1.0, 2.0, 0.0, 1.0, 1.0, 2.0, 2.0, 0.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 2.0, 1.0, 0.0, 0.0]
==> batch_data/output/iris_sample_X_5.csv.out <==
[1.0, 1.0, 0.0, 2.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 2.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 2.0, 2.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 2.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 1.0, 1.0, 1.0, 2.0, 2.0, 0.0, 1.0]
==> batch_data/output/iris_sample_X_6.csv.out <==
[1.0, 2.0, 2.0, 1.0, 2.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 2.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 0.0, 1.0, 0.0, 1.0, 2.0, 2.0, 1.0, 1.0, 2.0, 1.0, 2.0, 2.0, 2.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 2.0, 1.0, 0.0, 1.0, 1.0, 2.0, 1.0, 0.0, 2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 2.0]
==> batch_data/output/iris_sample_X_7.csv.out <==
[0.0, 2.0, 1.0, 1.0, 2.0, 0.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, 2.0, 2.0, 1.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 2.0, 2.0, 0.0, 2.0, 0.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 2.0, 2.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 0.0, 2.0, 0.0, 1.0, 2.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 2.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 0.0, 2.0, 2.0, 1.0, 1.0]
==> batch_data/output/iris_sample_X_8.csv.out <==
[1.0, 0.0, 0.0, 1.0, 2.0, 1.0, 0.0, 2.0, 0.0, 2.0, 0.0, 2.0, 1.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 2.0, 0.0, 2.0, 2.0, 2.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 1.0, 2.0, 0.0, 1.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 2.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 2.0, 0.0, 2.0, 2.0, 2.0, 1.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, 2.0, 0.0, 0.0, 2.0, 0.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 0.0]
==> batch_data/output/iris_sample_X_9.csv.out <==
[0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 1.0, 2.0, 0.0, 2.0, 1.0, 1.0, 0.0, 2.0, 1.0, 2.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 2.0, 0.0, 0.0, 2.0, 1.0, 1.0, 0.0, 1.0, 1.0, 2.0, 1.0, 2.0, 2.0, 2.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, 2.0, 2.0, 1.0, 2.0, 1.0, 2.0, 2.0, 0.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 0.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 1.0, 2.0, 2.0, 2.0]
[16]:
%%bash
# For each sample file, compare the predicted labels from batch output to the true labels
for i in {1..9}; do
    diff -s batch_data/Y/iris_sample_Y_${i}.csv \
        <(cat batch_data/output/iris_sample_X_${i}.csv.out | sed 's/[["]//g' | sed 's/, \|]/\n/g') \
        | sed "s/\/dev\/fd\/63/batch_data\/output\/iris_sample_X_${i}.csv.out/"
done
Files batch_data/Y/iris_sample_Y_1.csv and batch_data/output/iris_sample_X_1.csv.out are identical
Files batch_data/Y/iris_sample_Y_2.csv and batch_data/output/iris_sample_X_2.csv.out are identical
Files batch_data/Y/iris_sample_Y_3.csv and batch_data/output/iris_sample_X_3.csv.out are identical
Files batch_data/Y/iris_sample_Y_4.csv and batch_data/output/iris_sample_X_4.csv.out are identical
Files batch_data/Y/iris_sample_Y_5.csv and batch_data/output/iris_sample_X_5.csv.out are identical
Files batch_data/Y/iris_sample_Y_6.csv and batch_data/output/iris_sample_X_6.csv.out are identical
Files batch_data/Y/iris_sample_Y_7.csv and batch_data/output/iris_sample_X_7.csv.out are identical
Files batch_data/Y/iris_sample_Y_8.csv and batch_data/output/iris_sample_X_8.csv.out are identical
Files batch_data/Y/iris_sample_Y_9.csv and batch_data/output/iris_sample_X_9.csv.out are identical