CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

Overview

CorrProxies

Declaration

This repo is for paper: Optimizing Machine Learning Inference Queries with Correlative Proxy Models.

Setup ENV

Quick Start

  1. We provide a fully ready Docker Image ready to use out-of-box.
  2. Optionally, you can also follow the steps to build your own testing environment.

The Provided Docker Environment

Steps to run the Docker Environment

  • Get the docker image from this link.
  • Load the docker image. docker load -i corrproxies-image.tar
  • Run the docker image in a container. docker run --name=CorrProxies -i -t -d corrproxies-image
    • it will return you the docker container ID, for example d979af9a17f23345cb2894b22dc8527680acdfd7a7e1aaed6a7a28ea134e66e6.
  • Use CLI to control the container with the specific ID generated. docker exec -it d979af9a17f23345cb2894b22dc8527680acdfd7a7e1aaed6a7a28ea134e66e6 /bin/zsh

ENV Spec

File structure:

  • The home directory for CorrProxies locates at /home/CorrProxies.
  • The Python executable locates at /home/anaconda3/envs/condaenv/bin/python3.
  • The models locate at /home/CorrProxies/model.
  • The datasets locate at /home/CorrProxies/data.
  • The starting scripts locate at /home/CorrProxies/scripts.

Build Your Own Environment

This instruction is based on a clean distribution of [email protected]

  1. Install pre-requisites.

    apt-get update && apt-get install -y build-essential

  2. Install Anaconda.

    • wget https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh && bash Anaconda3-5.3.1-Linux-x86_64.sh -b -p
    • export PATH=" /bin/:$PATH"
  3. Install [email protected] with Anaconda3.

    conda create -n condaenv python=3.6.6

  4. Activate the newly installed Python ENV.

    conda activate condaenv

  5. Install dependencies with pip.

    pip3 install -r requirements.txt

  6. Install Java (openjdk-8) (for standford-nlp usage).

    apt-get install -y openjdk-8-jdk

Queries & Datasets

  • We use Twitter text dataset, COCO image dataset and UCF101 video dataset as our benchmark datasets. Please see this page for examples of detailed Queries and Datasets examples we use in our experiments.

  • After you setup the environment, either manually or using the docker image provided by us, the next step is to download the datasets.

    • To get the COCO dataset: cd /home/CorrProxies/data/image/coco && ./get_coco_dataset.sh
    • To get the UCF101 dataset: cd /home/CorrProxies/data/video/ucf101 && wget -c https://www.crcv.ucf.edu/data/UCF101/UCF101.rar && unrar x UCF101.rar.

Execution

Please pull the latest code before executing the code. Command cd /home/CorrProxies && git pull

Run Operators Individually

To run and see each operator we used in our experiment, simply execute python3 . For example: python3 operators/ml_operators/image_video_operators/video_activity_recognition.py.

Run Experiments

We use scripts/run.sh to start experiments. The script will take in command line arguments.

  • Text(Twitter)

    • Since we do not provide text dataset, we will skip the experiment.
  • Image(COCO)

    Example: ./scripts/run.sh -w 2 -t 1 -i '1' -a 0.9 -s 3 -o 2 -e 1

  • Video(UCF101)

    Example: ./scripts/run.sh -w 2 -t 2 -i '1' -a 0.9 -s 3 -o 2 -e 1

  • arguments detail.

    • w int: experiment type in [1, 2, 3, 4] referring to /home/CorrProxies/ml_workflow/exps/WorkflowExp*.py;
    • t int: query type in [0, 1, 2]. Int 0, 1, 2 means queries on the Twitter, COCO, and UCF101 datasets, respectively;
    • i int: query index in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    • a float: query accuracy;
    • s int: scheme in [0, 1, 2, 3, 4, 5, 6]. Int 0, 1, 2, 3, 4, 5, 6 means 'ORIG', 'NS', 'PP', 'CORE', 'COREa', 'COREh' and 'REORDER' schemes, respectively;
    • o int: number of threads used in optimization phase;
    • e int: number of threads used in execution phase after generating an optimized plan.
Owner
ZhihuiYangCS
ZhihuiYangCS
To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Astitva Veer Garg 1 Jan 11, 2022
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
Bodywork deploys machine learning projects developed in Python, to Kubernetes.

Bodywork deploys machine learning projects developed in Python, to Kubernetes. It helps you to: serve models as microservices execute batch jobs run r

Bodywork Machine Learning 409 Jan 01, 2023
A visual dataflow programming language for sklearn

Persimmon What is it? Persimmon is a visual dataflow language for creating sklearn pipelines. It represents functions as blocks, inputs and outputs ar

Álvaro Bermejo 194 Jan 04, 2023
Library of Stan Models for Survival Analysis

survivalstan: Survival Models in Stan author: Jacki Novik Overview Library of Stan Models for Survival Analysis Features: Variety of standard survival

Hammer Lab 122 Jan 06, 2023
Tutorial for Decision Threshold In Machine Learning.

Decision-Threshold-ML Tutorial for improve skills: 'Decision Threshold In Machine Learning' (from GeeksforGeeks) by Marcus Mariano For more informatio

0 Jan 20, 2022
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 02, 2023
Getting Profit and Loss Make Easy From Binance

Getting Profit and Loss Make Easy From Binance I have been in Binance Automated Trading for some time and have generated a lot of transaction records,

17 Dec 21, 2022
CVXPY is a Python-embedded modeling language for convex optimization problems.

CVXPY The CVXPY documentation is at cvxpy.org. We are building a CVXPY community on Discord. Join the conversation! For issues and long-form discussio

4.3k Jan 08, 2023
Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Simple but maybe too simple config management through python data classes. We use it for machine learning.

Eren Gölge 67 Nov 29, 2022
It is a forest of random projection trees

rpforest rpforest is a Python library for approximate nearest neighbours search: finding points in a high-dimensional space that are close to a given

Lyst 211 Dec 29, 2022
QML: A Python Toolkit for Quantum Machine Learning

QML is a Python2/3-compatible toolkit for representation learning of properties of molecules and solids.

176 Dec 09, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

shibuiwilliam 9 Sep 09, 2022
MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data

MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data. We demonstrate its use

Pachter Lab 26 Nov 29, 2022
Customers Segmentation with RFM Scores and K-means

Customer Segmentation with RFM Scores and K-means RFM Segmentation table: K-Means Clustering: Business Problem Rule-based customer segmentation machin

5 Aug 10, 2022
In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

SKlearn_to_MLFLow In this Repo a simple Sklearn Model will be trained and pushed to MLFlow Install This Repo is based on poetry python3 -m venv .venv

1 Dec 13, 2021
Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational model)

Sum-Square_Error-Business-Analytical-Tool- Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational m

om Podey 1 Dec 03, 2021
ml4ir: Machine Learning for Information Retrieval

ml4ir: Machine Learning for Information Retrieval | changelog Quickstart → ml4ir Read the Docs | ml4ir pypi | python ReadMe ml4ir is an open source li

Salesforce 77 Jan 06, 2023
Greykite: A flexible, intuitive and fast forecasting library

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

LinkedIn 1.7k Jan 04, 2023