Amazon SageMaker Feature Store: Encrypt Data in your Online or Offline Feature Store using KMS key

This notebook demonstrates how to enable encyption for your data in your online or offline Feature Store using KMS key. We start by showing how to programmatically create a KMS key, and how to apply it to the feature store creation process for data encryption. The last portion of this notebook demonstrates how to verify that your KMS key is being used to encerypt your data in your feature store.

Overview

  1. Create a KMS key.

    • How to create a KMS key programmatically using the KMS client from boto3?

  2. Attach role to your KMS key.

    • Attach the required entries to your policy for data encryption in your feature store.

  3. Create an online or offline feature store and apply it to your feature store creation process.

    • How to enable encryption for your online store?

    • How to enable encryption for your offline store?

  4. How to verify that your data is encrypted in your online or offline store?

Prerequisites

This notebook uses both boto3 and Python SDK libraries, and the Python 3 (Data Science) kernel. This notebook also works with Studio, Jupyter, and JupyterLab.

Library Dependencies:

  • sagemaker>=2.0.0

  • numpy

  • pandas

[ ]:
import sagemaker
import sys
import boto3
import pandas as pd
import numpy as np
import json

original_version = sagemaker.__version__
%pip install 'sagemaker>=2.0.0'

Set up

[ ]:
sagemaker_session = sagemaker.Session()
s3_bucket_name = sagemaker_session.default_bucket()
prefix = "sagemaker-featurestore-kms-demo"
role = sagemaker.get_execution_role()
region = sagemaker_session.boto_region_name

Create a KMS client using boto3. Note that you can access your boto session through your sagemaker session, e.g.,sagemaker_session.

[ ]:
kms = sagemaker_session.boto_session.client("kms")

KMS Policy Template

Below is the policy template you will use for creating a KMS key. You will specify your role to grant it access to various KMS operations that will be used in the back-end for encrypting your data in your Online or Offline Feature Store.

Note: You will need to substitute your Account number in for 123456789012 in the policy below for these lines: arn:aws:cloudtrail:*:123456789012:trail/*.

It is important to understand that the policy below will grant admin privileges for Customer Managed Keys (CMK) around viewing and revoking grants, decrypt and encrypt permissions on CloudTrail and full access permissions through Feature Store. Also, note that the the Feature Store Service creates additonal grants that are used for encryption purposes for your online store.

[ ]:
policy = {
    "Version": "2012-10-17",
    "Id": "key-policy-feature-store",
    "Statement": [
        {
            "Sid": "Allow access through Amazon SageMaker Feature Store for all principals in the account that are authorized to use Amazon SageMaker Feature Store",
            "Effect": "Allow",
            "Principal": {"AWS": role},
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:DescribeKey",
                "kms:CreateGrant",
                "kms:RetireGrant",
                "kms:ReEncryptFrom",
                "kms:ReEncryptTo",
                "kms:GenerateDataKey",
                "kms:ListAliases",
                "kms:ListGrants",
            ],
            "Resource": ["*"],
            "Condition": {"StringLike": {"kms:ViaService": "sagemaker.*.amazonaws.com"}},
        },
        {
            "Sid": "Allow administrators to view the CMK and revoke grants",
            "Effect": "Allow",
            "Principal": {"AWS": [role]},
            "Action": ["kms:Describe*", "kms:Get*", "kms:List*", "kms:RevokeGrant"],
            "Resource": ["*"],
        },
        {
            "Sid": "Enable CloudTrail Encrypt Permissions",
            "Effect": "Allow",
            "Principal": {"Service": "cloudtrail.amazonaws.com", "AWS": [role]},
            "Action": "kms:GenerateDataKey*",
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "kms:EncryptionContext:aws:cloudtrail:arn": [
                        "arn:aws:cloudtrail:*:123456789012:trail/*",
                        "arn:aws:cloudtrail:*:123456789012:trail/*",
                    ]
                }
            },
        },
        {
            "Sid": "Enable CloudTrail log decrypt permissions",
            "Effect": "Allow",
            "Principal": {"AWS": [role]},
            "Action": "kms:Decrypt",
            "Resource": ["*"],
            "Condition": {"Null": {"kms:EncryptionContext:aws:cloudtrail:arn": "false"}},
        },
    ],
}

Create your new KMS key using the policy above and your KMS client.

[ ]:
try:
    new_kms_key = kms.create_key(
        Policy=json.dumps(policy),
        Description="string",
        KeyUsage="ENCRYPT_DECRYPT",
        CustomerMasterKeySpec="SYMMETRIC_DEFAULT",
        Origin="AWS_KMS",
    )
    AliasName = "my-new-kms-key"  ## provide a unique alias name
    kms.create_alias(
        AliasName="alias/" + AliasName, TargetKeyId=new_kms_key["KeyMetadata"]["KeyId"]
    )
    print(new_kms_key)
except Exception as e:
    print("Error {}".format(e))

Now that we have our KMS key created and the necessary operations added to our role, we now load in our data.

[ ]:
customer_data = pd.read_csv("data/feature_store_introduction_customer.csv")
orders_data = pd.read_csv("data/feature_store_introduction_orders.csv")
[ ]:
customer_data.head()
[ ]:
orders_data.head()
[ ]:
customer_data.dtypes
[ ]:
orders_data.dtypes

Creating Feature Groups

We first start by creating feature group names for customer_data and orders_data. Following this, we create two Feature Groups, one for customer_dat and another for orders_data

[ ]:
from time import gmtime, strftime, sleep

customers_feature_group_name = "customers-feature-group-" + strftime("%d-%H-%M-%S", gmtime())
orders_feature_group_name = "orders-feature-group-" + strftime("%d-%H-%M-%S", gmtime())

Instantiate a FeatureGroup object for customers_data and orders_data.

[ ]:
from sagemaker.feature_store.feature_group import FeatureGroup

customers_feature_group = FeatureGroup(
    name=customers_feature_group_name, sagemaker_session=sagemaker_session
)
orders_feature_group = FeatureGroup(
    name=orders_feature_group_name, sagemaker_session=sagemaker_session
)
[ ]:
import time

current_time_sec = int(round(time.time()))

record_identifier_feature_name = "customer_id"

Append EventTime feature to your data frame. This parameter is required, and time stamps each data point.

[ ]:
customer_data["EventTime"] = pd.Series([current_time_sec] * len(customer_data), dtype="float64")
orders_data["EventTime"] = pd.Series([current_time_sec] * len(orders_data), dtype="float64")
[ ]:
customer_data.head()
[ ]:
orders_data.head()

Load feature definitions to your feature group.

[ ]:
customers_feature_group.load_feature_definitions(data_frame=customer_data)
orders_feature_group.load_feature_definitions(data_frame=orders_data)

How to create an Online or Offline Feature Store that uses your KMS key for encryption?

Below we create two feature groups, customers_feature_group and orders_feature_group respectively, and explain how use your KMS key to securely encrypt your data in your online or offline feature store.

How to create an Online Feature store with your KMS key?

To encrypt data in your online feature store, set enable_online_store to be True and specify your KMS key as parameter online_store_kms_key_id. You will need to substitute your Account number in arn:aws:kms:us-east-1:123456789012:key/ replacing 123456789012 with your Account number.

customers_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True,
    online_store_kms_key_id = 'arn:aws:kms:us-east-1:123456789012:key/'+ new_kms_key['KeyMetadata']['KeyId']
)

orders_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True,
    online_store_kms_key_id = 'arn:aws:kms:us-east-1:123456789012:key/'+new_kms_key['KeyMetadata']['KeyId']
)

How to create an Offline Feature store with your KMS key?

Similar to the above, set enable_online_store to be False and then specify your KMS key as parameter offline_store_kms_key_id. You will need to substitute your Account number in arn:aws:kms:us-east-1:123456789012:key/ replacing 123456789012 with your Account number.

customers_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=False,
    offline_store_kms_key_id = 'arn:aws:kms:us-east-1:123456789012:key/'+ new_kms_key['KeyMetadata']['KeyId']
)

orders_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=False,
    offline_store_kms_key_id = 'arn:aws:kms:us-east-1:123456789012:key/'+new_kms_key['KeyMetadata']['KeyId']
)

For this example we create an online feature store that encrypts your data using your KMS key.

Note: You will need to substitute your Account number in arn:aws:kms:us-east-1:123456789012:key/ replacing 123456789012 with your Account number.

[ ]:
customers_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=False,
    offline_store_kms_key_id="arn:aws:kms:us-east-1:123456789012:key/"
    + new_kms_key["KeyMetadata"]["KeyId"],
)

orders_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=False,
    offline_store_kms_key_id="arn:aws:kms:us-east-1:123456789012:key/"
    + new_kms_key["KeyMetadata"]["KeyId"],
)

How to verify that your KMS key is being used to encrypt your data in your Online or Offline Feature Store?

Online Store Verification

To demonstrate that your data is being encrypted in your Online store, use your kms client from boto3 to list the grants under your KMS key. It should show ‘SageMakerFeatureStore-’ and the name of your feature group you created and should list these operations under Operations:['Decrypt','Encrypt','GenerateDataKey','ReEncryptFrom','ReEncryptTo','CreateGrant','RetireGrant','DescribeKey']

An alternative way for you to check that your data is encrypted in your Online store is to check Cloud Trails and navigate to your account name. Once here, under General details you should see that SSE-KMS encryption is enabled and with your AWS KMS key shown below it. Below is a screenshot showing this:

Cloud Trails

Offline Store Verification

To verify that your data in being encrypted in your Offline store, you must navigate to your S3 bucket through the Console and then navigate to your prefix, offline store, feature group name and into the /data/ folder. Once here, select a parquet file which is the file containing your feature group data. For this example, the directory path in S3 was this:

Amazon S3/MYBUCKET/PREFIX/123456789012/sagemaker/region/offline-store/customers-feature-group-23-22-44-47/data/year=2021/month=03/day=23/hour=22/20210323T224448Z_IdfObJjhpqLQ5rmG.parquet.

After selecting the parquet file, navigate to Server-side encryption settings. It should mention that Default encryption is enabled and reference (SSE-KMS) under server-side encryption. If this show, then your data is being encrypted in the offline store. Below is a screenshot of how this should look like in the console:

Feature Store Policy

For this example since we created a secure Online store using our KMS key, below we use list_grants to check that our feature group and required grants are present under operations.

[ ]:
kms.list_grants(
    KeyId="arn:aws:kms:us-east-1:123456789012:key/" + new_kms_key["KeyMetadata"]["KeyId"]
)

Clean Up Resources

Remove the Feature Groups we created.

[ ]:
customers_feature_group.delete()
orders_feature_group.delete()
[ ]:
# preserve original sagemaker version
%pip install 'sagemaker=={}'.format(original_version)

Next Steps

For more information on how to use KMS to encrypt your data in your Feature Store, see Feature Store Security. For general information on KMS keys and CMK, see Customer Managed Keys.