Reinforcement Learning

Cart pole

A cart pole simulation is the act of balancing a broom upright by balancing it on your hand. The broom is the “pole” and your hand is replaced with a “cart” moving back and forth on a linear track. This simplified example works in 2 dimensions, so the cart can only move in a line back and forth, and the pole can only fall forwards or backwards, not to the sides. These examples use PyTorch or TensorFlow and SageMaker RL to solve a cart pole problem.

Contextual bandits

Explore a number of actions with contextual bandits algorithms in SageMaker.

Roboschool

Roboschool is a physics simulator that is commonly used to train RL policies for robotic systems.

Use cases

Autoscaling

This example demonstrates how to use RL to address scaling a production service by adding and removing resources (e.g. servers or EC2 instances) in reaction to a dynamic load.

Energy

Training an RL algorithm in a real HVAC system can take time to converge as well as potentially lead to hazardous settings as the agent explores its state space. This example uses the EnergyPlus simulator to showcase how you can train an HVAC optimization RL model with Amazon SageMaker.

Game server

A reinforcement learning-based system using SageMaker Autopilot and SageMaker RL that learns to allocate resources in response to player usage patterns.

Game servers autopilot

Knapsack problem

Use SageMaker RL to address a canonical operations research problem, aka, a knapsack problem.

Object tracker

Use RL to train a TurtleBot object tracker using Amazon SageMaker Reinforcement Learning and AWS RoboMaker.

Network compression

Network to network compression via policy gradient reinforcement learning.

Portfolio management

Use SageMaker RL to manage a stock portfolio by continuously reallocating several stocks.

Resource allocation

Solve resource allocation problems with SageMaker RL.

Tic-tac-toe

Play global thermonuclear war with a computer.

Traveling salesman problem

Use SageMaker RL to solve this classic problem with a twist: a restaurant delivery service on a 2D gridworld.

Traveling Salesman Problem with Reinforcement Learning