Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Overview

Apache Liminal

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way.

The platform provides the abstractions and declarative capabilities for data extraction & feature engineering followed by model training and serving. Liminal's goal is to operationalize the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production, freeing them from engineering and non-functional tasks, and allowing them to focus on machine learning code and artifacts.

Basics

Using simple YAML configuration, create your own schedule data pipelines (a sequence of tasks to perform), application servers, and more.

Getting Started

A simple getting stated guide for Liminal can be found here

Apache Liminal Documentation

Full documentation of Apache Liminal can be found here

High Level Architecture

High level architecture documentation can be found here

Example YAML config file

---
name: MyLiminalStack
owner: Bosco Albert Baracus
volumes:
  - volume: myvol1
    local:
      path: /Users/me/myvol1
pipelines:
  - pipeline: my_pipeline
    start_date: 1970-01-01
    timeout_minutes: 45
    schedule: 0 * 1 * *
    metrics:
      namespace: TestNamespace
      backends: [ 'cloudwatch' ]
    tasks:
      - task: my_python_task
        type: python
        description: static input task
        image: my_python_task_img
        source: write_inputs
        env_vars:
          NUM_FILES: 10
          NUM_SPLITS: 3
        mounts:
          - mount: mymount
            volume: myvol1
            path: /mnt/vol1
        cmd: python -u write_inputs.py
      - task: my_parallelized_python_task
        type: python
        description: parallelized python task
        image: my_parallelized_python_task_img
        source: write_outputs
        env_vars:
          FOO: BAR
        executors: 3
        mounts:
          - mount: mymount
            volume: myvol1
            path: /mnt/vol1
        cmd: python -u write_inputs.py
services:
  - service:
    name: my_python_server
    type: python_server
    description: my python server
    image: my_server_image
    source: myserver
    endpoints:
      - endpoint: /myendpoint1
        module: my_server
        function: myendpoint1func

Installation

  1. Install this repository (HEAD)
   pip install git+https://github.com/apache/incubator-liminal.git
  1. Optional: set LIMINAL_HOME to path of your choice (if not set, will default to ~/liminal_home)
echo 'export LIMINAL_HOME=' >> ~/.bash_profile && source ~/.bash_profile

Authoring pipelines

This involves at minimum creating a single file called liminal.yml as in the example above.

If your pipeline requires custom python code to implement tasks, they should be organized like this

If your pipeline introduces imports of external packages which are not already a part of the liminal framework (i.e. you had to pip install them yourself), you need to also provide a requirements.txt in the root of your project.

Testing the pipeline locally

When your pipeline code is ready, you can test it by running it locally on your machine.

  1. Ensure you have The Docker engine running locally, and enable a local Kubernetes cluster: Kubernetes configured

And allocate it at least 3 CPUs (under "Resources" in the Docker preference UI).

If you want to execute your pipeline on a remote kubernetes cluster, make sure the cluster is configured using :

kubectl config set-context <your remote kubernetes cluster>
  1. Build the docker images used by your pipeline.

In the example pipeline above, you can see that tasks and services have an "image" field - such as "my_static_input_task_image". This means that the task is executed inside a docker container, and the docker container is created from a docker image where various code and libraries are installed.

You can take a look at what the build process looks like, e.g. here

In order for the images to be available for your pipeline, you'll need to build them locally:

cd </path/to/your/liminal/code>
liminal build

You'll see that a number of outputs indicating various docker images built.

  1. Create a kubernetes local volume
    In case your Yaml includes working with volumes please first run the following command:
cd </path/to/your/liminal/code> 
liminal create
  1. Deploy the pipeline:
cd </path/to/your/liminal/code> 
liminal deploy

Note: after upgrading liminal, it's recommended to issue the command

liminal deploy --clean

This will rebuild the airlfow docker containers from scratch with a fresh version of liminal, ensuring consistency.

  1. Start the server
liminal start
  1. Stop the server
liminal stop
  1. Display the server logs
liminal logs --follow/--tail

Number of lines to show from the end of the log:
liminal logs --tail=10

Follow log output:
liminal logs --follow
  1. Navigate to http://localhost:8080/admin

  2. You should see your pipeline The pipeline is scheduled to run according to the json schedule: 0 * 1 * * field in the .yml file you provided.

  3. To manually activate your pipeline: Click your pipeline and then click "trigger DAG" Click "Graph view" You should see the steps in your pipeline getting executed in "real time" by clicking "Refresh" periodically.

Pipeline activation

Contributing

More information on contributing can be found here

Running Tests (for contributors)

When doing local development and running Liminal unit-tests, make sure to set LIMINAL_STAND_ALONE_MODE=True

Owner
The Apache Software Foundation
The Apache Software Foundation
Machine Learning Algorithms

Machine-Learning-Algorithms In this project, the dataset was created through a survey opened on Google forms. The purpose of the form is to find the p

Göktuğ Ayar 3 Aug 10, 2022
A simple machine learning python sign language detection project.

SST Coursework 2022 About the app A python application that utilises the tensorflow object detection algorithm to achieve automatic detection of ameri

Xavier Koh 2 Jun 30, 2022
Crypto-trading - ML techiques are used to forecast short term returns in 14 popular cryptocurrencies

Crypto-trading - ML techiques are used to forecast short term returns in 14 popular cryptocurrencies. We have amassed a dataset of millions of rows of high-frequency market data dating back to 2018 w

Panagiotis (Panos) Mavritsakis 4 Sep 22, 2022
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022
A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

Nicholas Monath 31 Nov 03, 2022
A simple python program which predicts the success of a movie based on it's type, actor, actress and director

Movie-Success-Prediction A simple python program which predicts the success of a movie based on it's type, actor, actress and director. The program us

Mahalinga Prasad R N 1 Dec 17, 2021
机器学习检测webshell

ai-webshell-detect 机器学习检测webshell,利用textcnn+简单二分类网络,基于keras,花了七天 检测原理: 从文件熵 文件长度 文件语句提取出特征,然后文件熵与长度送入二分类网络,文件语句送入textcnn 项目原理,介绍,怎么做出来的

Huoji's 56 Dec 14, 2022
MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training

MosaicML Composer MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training. We aim to ease th

MosaicML 2.8k Jan 06, 2023
Anomaly Detection and Correlation library

luminol Overview Luminol is a light weight python library for time series data analysis. The two major functionalities it supports are anomaly detecti

LinkedIn 1.1k Jan 01, 2023
BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python. Some of the algorithms included are mor

Jared M. Smith 40 Aug 26, 2022
CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL)

CyLP CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL). CyLP’s unique feature is that you can use i

COIN-OR Foundation 161 Dec 14, 2022
A Multipurpose Library for Synthetic Time Series Generation in Python

TimeSynth Multipurpose Library for Synthetic Time Series Please cite as: J. R. Maat, A. Malali, and P. Protopapas, “TimeSynth: A Multipurpose Library

278 Dec 26, 2022
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

An open-source, low-code machine learning library in Python 🚀 Version 2.3.5 out now! Check out the release notes here. Official • Docs • Install • Tu

PyCaret 6.7k Jan 08, 2023
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022
Xeasy-ml is a packaged machine learning framework.

xeasy-ml 1. What is xeasy-ml Xeasy-ml is a packaged machine learning framework. It allows a beginner to quickly build a machine learning model and use

9 Mar 14, 2022
This jupyter notebook project was completed by me and my friend using the dataset from Kaggle

ARM This jupyter notebook project was completed by me and my friend using the dataset from Kaggle. The world Happiness 2017, which ranks 155 countries

1 Jan 23, 2022
Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

END TO END MACHINE LEARNING PROJECT ON HITTERS DATASET Can a machine learning project be implemented to estimate the salaries of baseball players whos

Pinar Oner 7 Dec 18, 2021
This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022
Python Machine Learning Jupyter Notebooks (ML website)

Python Machine Learning Jupyter Notebooks (ML website) Dr. Tirthajyoti Sarkar, Fremont, California (Please feel free to connect on LinkedIn here) Also

Tirthajyoti Sarkar 2.6k Jan 03, 2023