Game servers autopilot

Multiplayer game publishers often need to either over-provision resources or manually manage compute resource allocation when launching a large-scale worldwide game, to avoid the long player-wait in the game lobby. Game publishers need to develop, config, and deploy tools that helped them to monitor and control the compute allocation.

This notebook demonstrates Game server autopilot, a new machine learning-based example tool that makes it easy for game publishers to reduce the time players wait for compute to spawn, while still avoiding compute over-provisioning. It also eliminates manual configuration decisions and changes publishers need to make and reduces the opportunity for human errors.

We heard from customers that optimizing compute resource allocation is not trivial. This is because it often takes substantial time to allocate and prepare EC2 instances. The time needed to spin up an EC2 instance and install game binaries and other assets must be learned and accounted for in the allocation algorithm. Ever-changing usage patterns require a model that is adaptive to emerging player habits. Finally, the system also performs scale down in concert with new server allocation as needed.

We describe a reinforcement learning-based system that learns to allocate resources in response to player usage patterns. The hosted model directly predicts the required number of game-servers so as to allow EKS the time to allocate instances to reduce player wait time. The training process integrates with the game eco-system, and requires minimal manual configuration.

Pre-requisites

Imports

To get started, we’ll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations.

[ ]:
import sagemaker
import boto3
import sys
import os
import glob
import re
import subprocess
import numpy as np
from IPython.display import HTML
import time
from time import gmtime, strftime

sys.path.append("common")
from misc import get_execution_role, wait_for_s3_object
from docker_utils import build_and_push_docker_image
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework

Setup S3 bucket

Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata.

[ ]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()
s3_output_path = "s3://{}/".format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

Define Variables

We define variables such as the job prefix, instance type and frameworks for the training jobs, to fetch the latest docker container to train your RL agent. You can also provide an image path for a custom container (only when this is BYOC).

Set framework to 'tf' as this notebook only supports TensorFlow.

[ ]:
# create a descriptive job name
job_name_prefix = "rl-game-server-autopilot"

framework = "tf"
[ ]:
# Pick the instance type
# instance_type = "ml.c5.xlarge" # 4 cpus
# instance_type = "ml.c5.9xlarge" # 36 cpus
instance_type = "ml.c5.2xlarge" # 8 cpus

num_cpus_per_instance = 8

Parameters

Adding new parameters for the job require update in the training section that invokes the RLEstimator.

[ ]:
job_duration_in_seconds = 60 * 60 * 8
train_instance_count = 1
cloudwatch_namespace = "rl-game-server-autopilot"
min_servers = 10
max_servers = 100
# over provisionning factor. use 5 for optimal.
over_prov_factor = 5
# gamma is the discount factor
gamma = 0.9
# if local inference is set gs_inventory_url=local and populate learning_freq
gs_inventory_url = "https://4bfiebw6ui.execute-api.us-west-2.amazonaws.com/api/currsine1h/"
# gs_inventory_url = 'local'
# sleep time in seconds between step() executions
learning_freq = 65
# actions are normelized between 0 and 1, action factor the number of game servers needed e.g. 100 will be 100*action and clipped to the min and max servers parameters above
action_factor = 100

Create an IAM role

Either get the execution role when running from a SageMaker notebook instance role = sagemaker.get_execution_role() or, when running from local notebook instance, use utils method role = get_execution_role() to create an execution role. In this example, the env thru the training job, publishes cloudwatch custom metrics as well as put values in DynamoDB table. Therefore, an appropriate role is required to be set to the role arn below.

[ ]:
try:
    role = sagemaker.get_execution_role()
except:
    role = get_execution_role()

print("Using IAM role arn: {}".format(role))

Set up the environment

The environment is defined in a Python file called gameserver_env.py and the file is uploaded on /src directory. The environment also implements the init(), step() and reset() functions that describe how the environment behaves. This is consistent with Open AI Gym interfaces for defining an environment. It also implements help functions for custom CloudWatch metrics (populate_cloudwatch_metric()) and a simple sine demand simulator (get_curr_sine1h())

  1. init() - initialize the environment in a pre-defined state

  2. step() - take an action on the environment

  3. reset()- restart the environment on a new episode

  4. get_curr_sine1h() - return the sine value based on the current second.

  5. populate_cloudwatch_metric(namespace,metric_value,metric_name) - populate the metric_name with metric_value in namespace.

[ ]:
!pygmentize src/gameserver_env.py

Configure the presets for RL algorithm

The presets that configure the RL training jobs are defined in the train_gameserver_ppo.py file which is also uploaded on the /src directory. Using the preset file, you can define agent parameters to select the specific agent algorithm. You can also set the environment parameters, define the schedule and visualization parameters, and define the graph manager. The schedule presets will define the number of heat up steps, periodic evaluation steps, training steps between evaluations. It can be used to define custom hyperparameters.

[ ]:
!pygmentize src/train_gameserver_ppo.py

Train the RL model using the Python SDK Script mode

The RLEstimator is used for training RL jobs.

  1. The entry_point value indicates the script that invokes the GameServer RL environment.

  2. source_dir indicates the location of environment code which currently includes train-gameserver-ppo.py and game_server_env.py.

  3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container.

  4. Define the training parameters such as the instance count, job name, S3 path for output and job name.

  5. Specify the hyperparameters for the RL agent algorithm. The RLCOACH_PRESET or the RLRAY_PRESET can be used to specify the RL agent algorithm you want to use.

  6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks.

[ ]:
metric_definitions = [
    {
        "Name": "episode_reward_mean",
        "Regex": "episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
    },
    {
        "Name": "episode_reward_max",
        "Regex": "episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
    },
    {
        "Name": "episode_len_mean",
        "Regex": "episode_len_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
    },
    {"Name": "entropy", "Regex": "entropy: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)"},
    {
        "Name": "episode_reward_min",
        "Regex": "episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)",
    },
    {"Name": "vf_loss", "Regex": "vf_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)"},
    {"Name": "policy_loss", "Regex": "policy_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)"},
]

metric_definitions
[ ]:
%%time

estimator = RLEstimator(
    entry_point="train_gameserver_ppo.py",
    source_dir="src",
    dependencies=["common/sagemaker_rl"],
    toolkit=RLToolkit.RAY,
    toolkit_version="1.6.0",
    framework=RLFramework.TENSORFLOW,
    role=role,
    instance_type=instance_type,
    instance_count=train_instance_count,
    output_path=s3_output_path,
    base_job_name=job_name_prefix,
    metric_definitions=metric_definitions,
    max_run=job_duration_in_seconds,
    hyperparameters={
        "cloudwatch_namespace": cloudwatch_namespace,
        "gs_inventory_url": gs_inventory_url,
        "learning_freq": learning_freq,
        "time_total_s": job_duration_in_seconds,
        "min_servers": min_servers,
        "max_servers": max_servers,
        "gamma": gamma,
        "action_factor": action_factor,
        "over_prov_factor": over_prov_factor,
        "save_model": 1,
    },
)

estimator.fit(wait=False)
job_name = estimator.latest_training_job.job_name
print("Training job: %s" % job_name)

Store intermediate training output and model checkpoints

The output from the training job above is stored in a S3.

[ ]:
%%time

job_name = estimator._current_job_name
print("Job name: {}".format(job_name))

s3_url = "s3://{}/{}".format(s3_bucket, job_name)

output_tar_key = "{}/output/model.tar.gz".format(job_name)

intermediate_folder_key = "{}/output/intermediate/".format(job_name)
output_url = "s3://{}/{}".format(s3_bucket, output_tar_key)
intermediate_url = "s3://{}/{}".format(s3_bucket, intermediate_folder_key)

print("S3 job path: {}".format(s3_url))
print("Output.tar.gz location: {}".format(output_url))
print("Intermediate folder path: {}".format(intermediate_url))

tmp_dir = "/tmp/{}".format(job_name)
os.system("mkdir {}".format(tmp_dir))
print("Create local folder {}".format(tmp_dir))

Evaluation of RL models

We use the latest checkpointed model to run evaluation for the RL Agent.

Load checkpointed model

Checkpointed data from the previously trained models will be passed on for evaluation / inference in the checkpoint channel. Since TensorFlow stores ckeckpoint file containes absolute paths from when they were generated (see issue), we need to replace the absolute paths to relative paths. This is implemented within evaluate-game-server.py

[ ]:
%%time
local_mode = False
if local_mode:
    model_tar_key = "{}/model.tar.gz".format(job_name)
else:
    model_tar_key = "{}/output/model.tar.gz".format(job_name)

local_checkpoint_dir = "{}/model".format(tmp_dir)

wait_for_s3_object(s3_bucket, model_tar_key, tmp_dir, training_job_name=job_name)

if not os.path.isfile("{}/model.tar.gz".format(tmp_dir)):
    raise FileNotFoundError("File model.tar.gz not found")

os.system("mkdir -p {}".format(local_checkpoint_dir))
os.system("tar -xvzf {}/model.tar.gz -C {}".format(tmp_dir, local_checkpoint_dir))

print("Checkpoint directory {}".format(local_checkpoint_dir))
[ ]:
if local_mode:
    checkpoint_path = "file://{}".format(local_checkpoint_dir)
    print("Local checkpoint file path: {}".format(local_checkpoint_dir))
else:
    checkpoint_path = "s3://{}/{}/checkpoint/".format(s3_bucket, job_name)
    if not os.listdir(local_checkpoint_dir):
        raise FileNotFoundError("Checkpoint files not found under the path")
    os.system("aws s3 cp --recursive {} {}".format(local_checkpoint_dir, checkpoint_path))
    print("S3 checkpoint file path: {}".format(checkpoint_path))

Run the evaluation step

Use the checkpointed model to run the evaluation step.

[ ]:
%%time

estimator_eval = RLEstimator(
    entry_point="evaluate_gameserver_ppo.py",
    source_dir="src",
    dependencies=["common/sagemaker_rl"],
    role=role,
    toolkit=RLToolkit.RAY,
    toolkit_version="1.6.0",
    framework=RLFramework.TENSORFLOW,
    instance_type=instance_type,
    instance_count=1,
    base_job_name=job_name_prefix + "-evaluation",
    hyperparameters={
        "evaluate_episodes": 1,
        "algorithm": "PPO",
        "env": "GameServers-v0",
    },
)
estimator_eval.fit({"model": checkpoint_path})
job_name = estimator_eval.latest_training_job.job_name
print("Evaluation job: %s" % job_name)

Hosting

Once the training is done, we can deploy the trained model as an Amazon SageMaker real-time hosted endpoint. This will allow us to make predictions (or inference) from the model. Note that we don’t have to host on the same insantance (or type of instance) that we used to train. The endpoint deployment can be accomplished as follows:

Model deployment

Now let us deploy the RL policy so that we can get the optimal action, given an environment observation.

[ ]:
from sagemaker.tensorflow.model import TensorFlowModel

model_data = estimator.model_data
model = TensorFlowModel(model_data=model_data, framework_version="2.5.1", role=role)

predictor = model.deploy(initial_instance_count=1, instance_type=instance_type)

Inference

Now that the trained model is deployed at an endpoint that is up-and-running, we can use this endpoint for inference. The format of input should match that of observation_space in the defined environment. In this example, the observation space is a 10 dimensional vector formulated from previous and current observations. For the sake of space, this demo doesn’t include the non-trivial construction process. Instead, we provide a dummy input below. For more details, please check src/gameserver_env.py.

[ ]:
example = [np.arange(10).tolist()]
example
[ ]:
# ray 1.6.0 requires all the following inputs, ray 0.8.5 or below remove 'timestep'
# 'prev_action', 'is_training', 'prev_reward', 'seq_lens' and 'timestep' are placeholders for this example
# they won't affect prediction results

input = {
    "inputs": {
        "observations": example,
        "prev_action": 0.5,
        "is_training": False,
        "prev_reward": -1,
        "seq_lens": -1,
        "timestep": 1,
    }
}
[ ]:
result = predictor.predict(input)

result["outputs"]

Delete the Endpoint

Having an endpoint running will incur some costs. Therefore as a clean-up job, we should delete the endpoint.

[ ]:
predictor.delete_endpoint()
[ ]: