Reinforcement Learning

mountain car policy

Get started with RL

Cart pole

A cart pole simulation is the act of balancing a broom upright by balancing it on your hand. The broom is the “pole” and your hand is replaced with a “cart” moving back and forth on a linear track. This simplified example works in 2 dimensions, so the cart can only move in a line back and forth, and the pole can only fall forwards or backwards, not to the sides. These examples use PyTorch or TensorFlow and SageMaker RL to solve a cart pole problem.

Contextual bandits

Explore a number of actions with contextual bandits algorithms in SageMaker.


Roboschool is a physics simulator that is commonly used to train RL policies for robotic systems.

Use cases


This example demonstrates how to use RL to address scaling a production service by adding and removing resources (e.g. servers or EC2 instances) in reaction to a dynamic load.


Training an RL algorithm in a real HVAC system can take time to converge as well as potentially lead to hazardous settings as the agent explores its state space. This example uses the EnergyPlus simulator to showcase how you can train an HVAC optimization RL model with Amazon SageMaker.

Game play

Use RL to train an agent to play in a Unity3D environment.

Game server

A reinforcement learning-based system using SageMaker Autopilot and SageMaker RL that learns to allocate resources in response to player usage patterns.

Knapsack problem

Use SageMaker RL to address a canonical operations research problem, aka, a knapsack problem.

Object tracker

Use RL to train a TurtleBot object tracker using Amazon SageMaker Reinforcement Learning and AWS RoboMaker.

Network compression

Network to network compression via policy gradient reinforcement learning.

Portfolio management

Use SageMaker RL to manage a stock portfolio by continuously reallocating several stocks.

Resource allocation

Solve resource allocation problems with SageMaker RL.


Play global thermonuclear war with a computer.

Traveling salesman problem

Use SageMaker RL to solve this classic problem with a twist: a restaurant delivery service on a 2D gridworld.