In this repository, we present a deployement-ready AWS stack which uses AWS Step Functions to orchestrate AutoML workflows using AutoGluon on Amazon SageMaker.
A complete description can be found in the corresponding blog post.
Main State Machine | Training State Machine | Deployment State Machine |
---|---|---|
- Node.js
16.13.1
- Python
3.7.10
-
Clone this repository to your cloud environment of choice (Cloud9, EC2 instance, local aws environemnt, ...)
-
Create IAM role needed to deploy the stack (skip to 3. if you already have a role with sufficient permissions and trust relationship).
-
Using AWS CLI
- Configure AWS CLI profile that you would like to use, if not configured yet with
aws configure
and follow the instructions - Create a new IAM role which can be used by Cloud Formation with
aws iam create-role --role-name {YOUR_ROLE_NAME} --assume-role-policy-document file://trust_policy.json
- Attach permissions policy to the new role
aws iam put-role-policy --role-name {YOUR_ROLE_NAME} --policy-name {YOUR_POLICY_NAME} --policy-document file://permissions_policy.json
- Configure AWS CLI profile that you would like to use, if not configured yet with
-
Alternatevily, you can create the role using AWS IAM Management Console. Once created, make sure to update Trust Relationship with
trust_policy.json
and attach a customer Permissions Policy based onpermissions_policy.json
-
Create a new python virtual environment
python3 -m venv .venv
-
Activate the environment
source .venv/bin/activate
-
Install AWS CDK
npm install -g aws-cdk@2.8.0
-
Install requirements
pip install -r requirements.txt
-
Bootstrap AWS CDK for your aws account
cdk bootstrap aws://{AWS_ACCOUNT_ID}/{REGION}
. If your account has been bootstrapped already withcdk@1.X
, you may need to manually deleteCDKToolkit
stack from AWS CloudFormation console to avoid compatibility issues withcdk@2.X
. Once de-bootstrapped, proceed by re-bootstrapping. -
Deploy the stack with
cdk deploy -r {NEW_ROLE_ARN}
Once the stack is deployed, you can familiarize with the resources using the tutorial notebooks/AutoML Walkthrough.ipynb
.
Action flows defined using AWS Step Functions are called State Machine.
Each machine has parameters that can be defined at runtime (i.e. execution-specific) which are specified through an input json object. Some exemples of input parameters are presented in notebooks/input/
. Despite being meant to be used during the notebook tutorial, you can also copy/paste them directly into the AWS Console.
Request Syntax
{
"Parameters": {
"Flow": {
"Train": true|false,
"Evaluate": true|false,
"Deploy": true|false
},
"PretrainedModel":{
"Name": "string"
},
"Train": {
"TrainDataPath": "string",
"TestDataPath": "string",
"TrainingOutput": "string",
"InstanceCount": int,
"InstanceType": "string",
"FitArgs": "string"",
"InitArgs": "string"
},
"Evaluation": {
"Threshold": flaot,
"Metric": "string"
},
"Deploy": {
"InstanceCount": int,
"InstanceType": "string",
"Mode": "endpoint"|"batch",
"BatchInputDataPath": "string",
"BatchOutputDataPath": "string"
}
}
}
Parameters
- Flow
- Train (bool) - (REQUIRED) indicates if a new AutoGluon SageMaker Training Job is required. Set to
false
to deploy a pretrained model. - Evaluation (bool) - set to
true
if evaluation is required. If selected, a AWS Lambda will retreive model performances on test set and evaluate them agains user-defined threshold. If model performances are not satisfactory, deployment is skipped. - Deploy (bool) - (REQUIRED) indicates if model has to be deployed.
- Train (bool) - (REQUIRED) indicates if a new AutoGluon SageMaker Training Job is required. Set to
- PretrainedModel
- Name (string) - indicates which pre-trained model to be used for deployment. Models are referenced through their SageMaker Model Name. If
Flow.Train = true
this field is ignored, otherwise it's required.
- Name (string) - indicates which pre-trained model to be used for deployment. Models are referenced through their SageMaker Model Name. If
- Train (REQUIRED if
Flow.Train = true
)- TrainDataPath (string) - S3 URI where train
csv
is stored. Header and target variable are required. AutoGluon will perform holdout split for validation automatically. - TestDataPath (string) - S3 URI where test
csv
is stored. Header and target variable are required. Dataset is used to evaluate model performances on samples not seen during training. - TrainingOutput (string) - S3 URI where to store model artifacts at the end of training job.
- InstanceCount (int) - Number of instances to be used for training.
- InstanceType (string) - AWS instance type to be used for training (e.g.
ml.m4.2xlarge
). See full list here. - FitArgs (string) - double JSON-encoded dictionary containing parameters to be used during model
.fit()
. List of available parameters here. Dictionary needs to be encoded twice because it will be decoded both by State Machine and SageMaker Training Job. - InitArgs (string) - double JSON-encoded dictionary containing parameters to be used when model is initiated
TabularPredictor()
. List of available parameters here. Dictionary needs to be encoded twice because it will be decoded both by State Machine and SageMaker Training Job. Common parameters arelabel
,problem_type
andeval_metric
.
- TrainDataPath (string) - S3 URI where train
- Evaluation (REQUIRED if
Flow.Evaluate = true
)- Threshold (float) - Metric threshold to consider model performance satisfactory. All metrics are maximized (e.g. losses are repesented as negative losses).
- Metric (string) - Metric name used for evaluation. Accepted metrics correspond to avaiable
eval_metric
from AutoGluon.
- Deploy (REQUIRED if
Flow.Deploy = true
)- InstanceCount (int) - Number of instances to be used for training.
- InstanceType (string) - AWS instance type to be used for training (e.g.
ml.m4.2xlarge
). See full list here. - Mode (string) - Model deployment mode. Supported modes are
batch
for SageMaker Batch Transform Job andendpoint
for SageMaker Endpoint. - BatchInputDataPath (string) - (REQUIRED if
mode=batch
) S3 URI of dataset against which predictions are generated. Data must be store incsv
format, without header and with same columns order of training dataset. - BatchOutputDataPath (string) - (REQUIRED if
mode=batch
) S3 URI to where to store batch predictions.
app.py
entrypointstepfunctions_automl_workflow/lambdas/
AWS Lambda source scriptsstepfunctions_automl_workflow/utils/
utils functions used across for stack generationstepfunctions_automl_workflow/stack.py
CDK stack definitionnotebooks/
Jupyter Notebooks to familiarise with the artifactsnotebooks/input/
Input examples to be fed in State Machines
WARNING: While you'll still be able to keep SageMaker artifacts, the AWS Step Functions State Machines will be deleted along with their execution history.
Clean-up all resources with cdk destroy
.
cdk ls
list all stacks in the appcdk synth
emits the synthesized CloudFormation templatecdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk docs
open CDK documentation