Rebill Optimization ML Pipeline

Step FunctionsSageMakerAWS GlueMachine LearningETL

Architecture Overview

Hover over the highlighted elements in the diagram to reveal additional details.

              Job: Retrieve raw data
AWS 
Redshift
Table 1
Col A1
Col B1
Col C1
...
...
...
Table N
Col AN
Col BN
Col CN
...
...
...

AWS Step Functions workflow
AWS S3
The bucket blocks public access, and uses OAC to allow CloudFront access.The bucket blocks public access, and uses OAC to allow CloudFront access.The bucket blocks public access, and uses OAC to allow CloudFront access.The bucket blocks public access, and uses OAC to allow CloudFront access.
Data Bucket
raw/
    |_ 2025-01-01/
        |_ job-123-456/
The temp/ directory is used by AWS Glue jobs, when scripts are runningThe temp/ directory is used by AWS Glue jobs, when scripts are running
temp/
preprocessed/
    |_ 2025-01-01/
        |_ job-123-456/
The bucket blocks public access, and uses OAC to allow CloudFront access.The bucket blocks public access, and uses OAC to allow CloudFront access.The bucket blocks public access, and uses OAC to allow CloudFront access.The bucket blocks public access, and uses OAC to allow CloudFront access.
Scripts Bucket
glue/
    |_ retrieveRawData.py
    |_ processRawData.py
sageMaker/
    |_ trainingJob.py
1) Get script
3) Save raw data
1) Get script
2) Get raw data
3) Save cleaned up data
2) Get raw data
1) Get training script
The specs of this instance (vRAM, vCPUs, storage) are configured in SageMaker. The training script trains the previous model on the new data.The specs of this instance (vRAM, vCPUs, storage) are configured in SageMaker. The training script trains the previous model on the new data.
Training instance
2) Spin up an instance & start training
The bucket blocks public access, and uses OAC to allow CloudFront access.The bucket blocks public access, and uses OAC to allow CloudFront access.The bucket blocks public access, and uses OAC to allow CloudFront access.The bucket blocks public access, and uses OAC to allow CloudFront access.
Models Bucket
v1.0
v2.0
3) Get data
4) Get previous trained model
5) Save new model
Start
              Job: Process raw data
               (Feature Engineering)
 AWS Glue 
This implements a Random Forest algorithm. The data is split into 80% train and 20% test.This implements a Random Forest algorithm. The data is split into 80% train and 20% test.
Training Job
(Classification)
AWS
SageMaker
This function evaluates the performance of the model on the test data.This function evaluates the performance of the model on the test data.
             Evaluate model performance
AWS
Lambda
Pass?
This function evaluates the performance of the model on the test data.This function evaluates the performance of the model on the test data.
Deploy to SageMaker
Inference Endpoint
End
 AWS Glue 
This function evaluates the performance of the model on the test data.This function evaluates the performance of the model on the test data.
Trigger warning
message

Project Details

A machine learning system that predicts the optimal timing for rebill attempts to maximize payment success rates. The model classifies failed payments into 192 distinct time-slot categories, ranging from "retry in 1 hour" to "retry next week at hour X".

Core ML Approach

  • 192-class classification model for optimal rebill timing prediction
  • Time slots ranging from same-day retries (1-12 hours) to week-long schedules
  • Transfer learning: each training builds upon previous model knowledge
  • Feature engineering from historical transaction patterns and user behavior

Technical Architecture

  • AWS Step Functions orchestrating the entire ML workflow
  • AWS Glue for scalable ETL data processing
  • Amazon SageMaker for model training and hosting
  • Amazon S3 for data storage and model artifacts
  • AWS Redshift as the primary data warehouse

Data & Training Pipeline

  • Monthly batch processing of transactional data from Redshift
  • Incremental model training with automated evaluation
  • Model versioning and A/B testing capabilities
  • Automated data preprocessing and feature engineering