data prep

Image data guide

An image data pipeline for machine learning is critical for performance during training and inference. You also need to know the formats and “shapes” of the images that your framework of choice requires. Additionally, you can further encode images in optimized formats that will speed up your ML processes. The following guide covers how you can preprocess images using SageMaker’s built-in image processing and for PyTorch or TensorFlow training.

To get started, run the following notebooks in order. There are four phases:
  1. Download data

  2. Structure data

  3. Preprocess (choose one of SageMaker built-in, PyTorch, or TensorFlow)

  4. Train (choose one of SageMaker built-in, PyTorch, or TensorFlow)

Download your image data

First, download the data.

Structure your image data

Now you structure the data before the next phase which is framework-specific.

Preprocessing

For preprocessing, you have several options. This guide covers SageMaker’s built-in option and options for PyTorch or TensorFlow. Choose one of the following notebooks and run it prior to going to the training step for the preprocessing option you chose.

with SageMaker built-in

with PyTorch

with TensorFlow

Training on image data

Now that you preprocessed your image data, choose the corresponding notebook to train with.

with SageMaker built-in

with PyTorch

with TensorFlow