Debugger

Examples on how to use SageMaker Debugger.

Get started with SageMaker Debugger

Debugging

Profiling

Debugging Model Parameters

You can track and debug model parameters, such as weights, gradients, biases, and scalar values of your training job. Available deep learning frameworks are Apache MXNet, TensorFlow, PyTorch, and XGBoost.

Real-time analysis of deep learning models

Apache MXNet

TensorFlow 2.x

TensorFlow 1.x

PyTorch

XGBoost

Explainability with Amazon SageMaker Debugger

Bring your own container

Build a Custom Training Container and Debug Training Jobs with Amazon SageMaker Debugger

Profiling System Bottlenecks and Framework Operators

Debugger provides the following profile features:

Monitoring system bottlenecks – Monitor system resource utilization rate, such as CPU, GPU, memories, network, and data I/O metrics. This is a framework and model agnostic feature and available for any training jobs in SageMaker.
Profiling deep learning framework operations – Profile deep learning operations of the TensorFlow and PyTorch frameworks, such as step durations, data loaders, forward and backward operations, Python profiling metrics, and framework-specific metrics.

Debugger

Get started with SageMaker Debugger

Debugging

Profiling

Debugging Model Parameters

Real-time analysis of deep learning models

Apache MXNet

TensorFlow 2.x

TensorFlow 1.x

PyTorch

XGBoost

Bring your own container

Profiling System Bottlenecks and Framework Operators

Tensorflow

PyTorch