CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

Overview

CorrProxies

Declaration

This repo is for paper: Optimizing Machine Learning Inference Queries with Correlative Proxy Models.

Setup ENV

Quick Start

  1. We provide a fully ready Docker Image ready to use out-of-box.
  2. Optionally, you can also follow the steps to build your own testing environment.

The Provided Docker Environment

Steps to run the Docker Environment

  • Get the docker image from this link.
  • Load the docker image. docker load -i corrproxies-image.tar
  • Run the docker image in a container. docker run --name=CorrProxies -i -t -d corrproxies-image
    • it will return you the docker container ID, for example d979af9a17f23345cb2894b22dc8527680acdfd7a7e1aaed6a7a28ea134e66e6.
  • Use CLI to control the container with the specific ID generated. docker exec -it d979af9a17f23345cb2894b22dc8527680acdfd7a7e1aaed6a7a28ea134e66e6 /bin/zsh

ENV Spec

File structure:

  • The home directory for CorrProxies locates at /home/CorrProxies.
  • The Python executable locates at /home/anaconda3/envs/condaenv/bin/python3.
  • The models locate at /home/CorrProxies/model.
  • The datasets locate at /home/CorrProxies/data.
  • The starting scripts locate at /home/CorrProxies/scripts.

Build Your Own Environment

This instruction is based on a clean distribution of [email protected]

  1. Install pre-requisites.

    apt-get update && apt-get install -y build-essential

  2. Install Anaconda.

    • wget https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh && bash Anaconda3-5.3.1-Linux-x86_64.sh -b -p
    • export PATH=" /bin/:$PATH"
  3. Install [email protected] with Anaconda3.

    conda create -n condaenv python=3.6.6

  4. Activate the newly installed Python ENV.

    conda activate condaenv

  5. Install dependencies with pip.

    pip3 install -r requirements.txt

  6. Install Java (openjdk-8) (for standford-nlp usage).

    apt-get install -y openjdk-8-jdk

Queries & Datasets

  • We use Twitter text dataset, COCO image dataset and UCF101 video dataset as our benchmark datasets. Please see this page for examples of detailed Queries and Datasets examples we use in our experiments.

  • After you setup the environment, either manually or using the docker image provided by us, the next step is to download the datasets.

    • To get the COCO dataset: cd /home/CorrProxies/data/image/coco && ./get_coco_dataset.sh
    • To get the UCF101 dataset: cd /home/CorrProxies/data/video/ucf101 && wget -c https://www.crcv.ucf.edu/data/UCF101/UCF101.rar && unrar x UCF101.rar.

Execution

Please pull the latest code before executing the code. Command cd /home/CorrProxies && git pull

Run Operators Individually

To run and see each operator we used in our experiment, simply execute python3 . For example: python3 operators/ml_operators/image_video_operators/video_activity_recognition.py.

Run Experiments

We use scripts/run.sh to start experiments. The script will take in command line arguments.

  • Text(Twitter)

    • Since we do not provide text dataset, we will skip the experiment.
  • Image(COCO)

    Example: ./scripts/run.sh -w 2 -t 1 -i '1' -a 0.9 -s 3 -o 2 -e 1

  • Video(UCF101)

    Example: ./scripts/run.sh -w 2 -t 2 -i '1' -a 0.9 -s 3 -o 2 -e 1

  • arguments detail.

    • w int: experiment type in [1, 2, 3, 4] referring to /home/CorrProxies/ml_workflow/exps/WorkflowExp*.py;
    • t int: query type in [0, 1, 2]. Int 0, 1, 2 means queries on the Twitter, COCO, and UCF101 datasets, respectively;
    • i int: query index in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    • a float: query accuracy;
    • s int: scheme in [0, 1, 2, 3, 4, 5, 6]. Int 0, 1, 2, 3, 4, 5, 6 means 'ORIG', 'NS', 'PP', 'CORE', 'COREa', 'COREh' and 'REORDER' schemes, respectively;
    • o int: number of threads used in optimization phase;
    • e int: number of threads used in execution phase after generating an optimized plan.
Owner
ZhihuiYangCS
ZhihuiYangCS
Python Research Framework

Python Research Framework

EleutherAI 106 Dec 13, 2022
Simple Machine Learning Tool Kit

Getting started smltk (Simple Machine Learning Tool Kit) package is implemented for helping your work during data preparation testing your model The g

Alessandra Bilardi 1 Dec 30, 2021
Machine Learning from Scratch

Machine Learning from Scratch Author: Shengxuan Wang From: Oregon State University Content: Building Machine Learning model from Scratch, without usin

ShawnWang 0 Jul 05, 2022
Machine learning that just works, for effortless production applications

Machine learning that just works, for effortless production applications

Elisha Yadgaran 16 Sep 02, 2022
Python module for machine learning time series:

seglearn Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extr

David Burns 536 Dec 29, 2022
Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis.

Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis. It is distributed under the MIT License.

Jeong-Yoon Lee 720 Dec 25, 2022
Toolss - Automatic installer of hacking tools (ONLY FOR TERMUKS!)

Tools Автоматический установщик хакерских утилит (ТОЛЬКО ДЛЯ ТЕРМУКС!) Оригиналь

14 Jan 05, 2023
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 03, 2023
Neural Machine Translation (NMT) tutorial with OpenNMT-py

Neural Machine Translation (NMT) tutorial with OpenNMT-py. Data preprocessing, model training, evaluation, and deployment.

Yasmin Moslem 29 Jan 09, 2023
Distributed Computing for AI Made Simple

Project Home Blog Documents Paper Media Coverage Join Fiber users email list Uber Open Source 997 Dec 30, 2022

Machine Learning Study 혼자 해보기

Machine Learning Study 혼자 해보기 기여자 (Contributors) ✨ Teddy Lee 🏠 HongJaeKwon 🏠 Seungwoo Han 🏠 Tae Heon Kim 🏠 Steve Kwon 🏠 SW Song 🏠 K1A2 🏠 Wooil

Teddy Lee 1.7k Jan 01, 2023
[HELP REQUESTED] Generalized Additive Models in Python

pyGAM Generalized Additive Models in Python. Documentation Official pyGAM Documentation: Read the Docs Building interpretable models with Generalized

daniel servén 747 Jan 05, 2023
A high-performance topological machine learning toolbox in Python

giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G

giotto.ai 632 Dec 29, 2022
Crunchdao - Python API for the Crunchdao machine learning tournament

Python API for the Crunchdao machine learning tournament Interact with the Crunc

3 Jan 19, 2022
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

Stox 31 Dec 16, 2022
MLR - Machine Learning Research

Machine Learning Research 1. Project Topic 1.1. Exsiting research Benmark: https://paperswithcode.com/sota ACL anthology for NLP papers: http://www.ac

Charles 69 Oct 20, 2022
Both social media sentiment and stock market data are crucial for stock price prediction

Relating-Social-Media-to-Stock-Movement-Public - We explore the application of Machine Learning for predicting the return of the stock by using the information of stock returns. A trading strategy ba

Vishal Singh Parmar 15 Oct 29, 2022
MasTrade is a trading bot in baselines3,pytorch,gym

mastrade MasTrade is a trading bot in baselines3,pytorch,gym idea we have for example 1 btc and we buy a crypto with it with market option to trade in

Masoud Azizi 18 May 24, 2022
Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

1 Aug 06, 2022
Climin is a Python package for optimization, heavily biased to machine learning scenarios

climin climin is a Python package for optimization, heavily biased to machine learning scenarios distributed under the BSD 3-clause license. It works

Biomimetic Robotics and Machine Learning at Technische Universität München 177 Sep 02, 2022