Debugger

Examples on how to use SageMaker Debugger.

Get started with SageMaker Debugger

Debugging

Profiling


Debugging Model Parameters

You can track and debug model parameters, such as weights, gradients, biases, and scalar values of your training job. Available deep learning frameworks are Apache MXNet, TensorFlow, PyTorch, and XGBoost.

Real-time analysis of deep learning models

Apache MXNet

TensorFlow 2.x

TensorFlow 1.x

PyTorch

XGBoost

Bring your own container


Profiling System Bottlenecks and Framework Operators

Debugger provides the following profile features:

  • Monitoring system bottlenecks – Monitor system resource utilization rate, such as CPU, GPU, memories, network, and data I/O metrics. This is a framework and model agnostic feature and available for any training jobs in SageMaker.

  • Profiling deep learning framework operations – Profile deep learning operations of the TensorFlow and PyTorch frameworks, such as step durations, data loaders, forward and backward operations, Python profiling metrics, and framework-specific metrics.

Tensorflow

PyTorch