A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

Last update: Dec 03, 2022

Overview

MLOps

A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

Tools used:

Data Pipeline: Dagster
ML workflow: MLflow
API Deployment: FastAPI
Monitoring: ElasticAPM

Blog posts

Requirements

Poetry (dependency management)

$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
$ poetry --version
# Poetry version 1.1.10

pre-commit (static code analysis)

$ pip install pre-commit
$ pre-commit --version
# pre-commit 2.15.0

Minio (s3 compatible object storage)

Follow the instructions here - https://min.io/download

Setup

Environment setup

$ poetry install

MLflow

$ poetry shell
$ export MLFLOW_S3_ENDPOINT_URL=http://127.0.0.1:9000
$ export AWS_ACCESS_KEY_ID=minioadmin
$ export AWS_SECRET_ACCESS_KEY=minioadmin

# make sure that the backend store and artifact locations are same in the .env file as well
$ mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root s3://mlflow \
    --host 0.0.0.0

Minio

$ export MINIO_ROOT_USER=minioadmin
$ export MINIO_ROOT_PASSWORD=minioadmin

$ mkdir minio_data
$ minio server minio_data --console-address ":9001"

# API: http://192.168.29.103:9000  http://10.119.80.13:9000  http://127.0.0.1:9000
# RootUser: minioadmin
# RootPass: minioadmin

# Console: http://192.168.29.103:9001 http://10.119.80.13:9001 http://127.0.0.1:9001
# RootUser: minioadmin
# RootPass: minioadmin

# Command-line: https://docs.min.io/docs/minio-client-quickstart-guide
#    $ mc alias set myminio http://192.168.29.103:9000 minioadmin minioadmin

# Documentation: https://docs.min.io

Go to http://127.0.0.1:9001/buckets/ and create a bucket called mlflow.

Dagster

$ poetry shell
$ dagit -f mlops/pipeline.py

ElasticAPM

$ docker-compose -f docker-compose-monitoring.yaml up

FastAPI

$ poetry shell
$ export PYTHONPATH=.
$ python mlops/app/application.py

TODO

Setup with docker-compose.
Load testing.
Test cases.
CI/CD pipeline.
Drift detection.

A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

Related tags

Overview

MLOps

Requirements

Poetry (dependency management)

pre-commit (static code analysis)

Minio (s3 compatible object storage)

Setup

Environment setup

MLflow

Minio

Dagster

ElasticAPM

FastAPI

TODO

Owner

Utsav

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

Tools for diffing and merging of Jupyter notebooks.

Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.

Bodywork deploys machine learning projects developed in Python, to Kubernetes.

Distributed Evolutionary Algorithms in Python

Upgini : data search library for your machine learning pipelines

Module for statistical learning, with a particular emphasis on time-dependent modelling

Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

A high-performance topological machine learning toolbox in Python

Gaussian Process Optimization using GPy

Turning images into '9-pan' palettes using KMeans clustering from sklearn.

Retrieve annotated intron sequences and classify them as minor (U12-type) or major (U2-type)

Dieses Projekt ermöglicht es den Smartmeter der EVN (Netz Niederösterreich) über die Kundenschnittstelle auszulesen.

slim-python is a package to learn customized scoring systems for decision-making problems.

We have a dataset of user performances. The project is to develop a machine learning model that will predict the salaries of baseball players.

BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models.

A library of sklearn compatible categorical variable encoders

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.