MLOps pipeline project using Amazon SageMaker Pipelines

Overview

Welcome to MLOps pipeline project using Amazon SageMaker Pipelines

This project utilizes SageMaker Pipelines that offers machine learning (ML) application developers and operations engineers the ability to orchestrate SageMaker jobs and author reproducible ML pipelines. It enables users to deploy custom-build models for batch and real-time inference with low latency and track lineage of artifacts.

Key Hightlights:
--Visual map to monitor end to end data and ML pipeline progress
--Model Registry to main different model versions and associated metadata
--Access to SageMaker processing jobs to scale/distribute workloads across multiple instances
--Inbuilt workflow orchestration without the need to leverage Step Functions etc
--Human review component
--Model drift detection

Code Layout

|-- data/        --> data file for inference purpose
|-- infra/       --> This folder contains helper function to create iam roles, policies
|-- README.md    --> The summary file of this project
|-- img/         --> images
|-- RegMLNB/     --> This folder contains files for data prep, model training, deployment and inference, model monitoring etc   
|-- pipeline.py  --> This file contain orchestration pipeline for data prep, model training,inference
|-- lambda_deployer.py --> Lambda function to create an endpoint
|-- requirements.txt --> This file contains project dependencies

Architecture Diagram

arch-diag

Data

fake_train_data.csv - This file has a randomly generated dataset, using Pythons random package. All labels and probability percentages are from a random number generator. It's used as a proof of concept for setting train set baseline statistics.

Get Started

This project is templatized with Amazon CDK. The cdk.json file tells the CDK Toolkit how to execute your app.

This project is set up like a standard Python project. The initialization process also creates a virtualenv within this project, stored under the .venv directory. To create the virtualenv it assumes that there is a python3 executable in your path with access to the venv package. If for any reason the automatic creation of the virtualenv fails, you can create the virtualenv manually once the init process completes.

To manually create a virtualenv on MacOS and Linux:

python3 -m venv .venv

After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.

$ source .venv/bin/activate

Once the virtualenv is activated, you can install the required dependencies.

pip install -r requirements.txt

At this point you can now synthesize the CloudFormation template for this code.

cdk synth
cdk deploy --all --outputs-file ./cdk-outputs.json

or you can also deploy the stack by running : cdk deploy regml-stack --outputs-file ./cdk-outputs.json

Note: The output file parameter will automate the transfer of your created IAM role ARN to pipeline.py.

Once the stack is created, run the following command:

python pipeline.py

To add additional dependencies, for example other CDK libraries, just add to your requirements.txt file and rerun the pip install -r requirements.txt command.

Useful commands

`cdk ls` list all stacks in the app
`cdk synth` emits the synthesized CloudFormation template
`cdk deploy` deploy this stack to your default AWS account/region
`cdk diff` compare deployed stack with current state
`cdk docs` open CDK documentation

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Owner
AWS Samples
AWS Samples
Time-series momentum for momentum investing strategy

Time-series-momentum Time-series momentum strategy. You can use the data_analysis.py file to find out the best trigger and window for a given asset an

Victor Caldeira 3 Jun 18, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 07, 2023
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

shibuiwilliam 9 Sep 09, 2022
Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application (with docker-compose).

Philip May 2 Dec 03, 2021
MasTrade is a trading bot in baselines3,pytorch,gym

mastrade MasTrade is a trading bot in baselines3,pytorch,gym idea we have for example 1 btc and we buy a crypto with it with market option to trade in

Masoud Azizi 18 May 24, 2022
Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Artsem Zhyvalkouski 64 Nov 30, 2022
Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification Introduction. This package includes the pyth

5 Dec 06, 2022
Pandas DataFrames and Series as Interactive Tables in Jupyter

Pandas DataFrames and Series as Interactive Tables in Jupyter Star Turn pandas DataFrames and Series into interactive datatables in both your notebook

Marc Wouts 364 Jan 04, 2023
LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

LibRerank LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRer

126 Dec 28, 2022
A collection of machine learning examples and tutorials.

machine_learning_examples A collection of machine learning examples and tutorials.

LazyProgrammer.me 7.1k Jan 01, 2023
Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc)

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc). Structured a custom ensemble model and a neural network. Found a outperformed

Chris Yuan 1 Feb 06, 2022
Production Grade Machine Learning Service

This project is made to help you scale from a basic Machine Learning project for research purposes to a production grade Machine Learning web service

Abdullah Zaiter 10 Apr 04, 2022
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning.

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported ha

Microsoft 1.1k Jan 04, 2023
moDel Agnostic Language for Exploration and eXplanation

moDel Agnostic Language for Exploration and eXplanation Overview Unverified black box model is the path to the failure. Opaqueness leads to distrust.

Model Oriented 1.2k Jan 04, 2023
Predict the output which should give a fair idea about the chances of admission for a student for a particular university

Predict the output which should give a fair idea about the chances of admission for a student for a particular university.

ArvindSandhu 1 Jan 11, 2022
Spark development environment for k8s

Local Spark Dev Env with Docker Development environment for k8s. Using the spark-operator image to ensure it will be the same environment. Start conta

Otacilio Filho 18 Jan 04, 2022
Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

Microsoft 43.4k Jan 04, 2023
Predicting India’s COVID-19 Third Wave with LSTM

Predicting India’s COVID-19 Third Wave with LSTM Complete project of predicting new COVID-19 cases in the next 90 days with LSTM India is seeing a ste

Samrat Dutta 4 Jan 27, 2022
Getting Profit and Loss Make Easy From Binance

Getting Profit and Loss Make Easy From Binance I have been in Binance Automated Trading for some time and have generated a lot of transaction records,

17 Dec 21, 2022
A machine learning model for Covid case prediction

CovidcasePrediction A machine learning model for Covid case prediction Problem Statement Using regression algorithms we can able to track the active c

VijayAadhithya2019rit 1 Feb 02, 2022