Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Science tasks.
Table of Contents
About the project
Disclaimer
This library is an open-source research project and is not part of any official SAP products.
What's this?
This is a simple but accurate Automated Machine Learning library. Based on SAP HANA powerful in-memory algorithms, it provides high accuracy in multiple machine learning tasks. Our library also uses numerous data preprocessing functions to automate routine data cleaning tasks. So, hana_automl goes through all AutoML steps and makes Data Science work easier.
What is SAP HANA?
From www.sap.com: SAP HANA is a high-performance in-memory database that speeds data-driven, real-time decisions and actions.
Web app
https://share.streamlit.io/dan0nchik/sap-hana-automl/main/web.py
Documentation
https://sap-hana-automl.readthedocs.io/en/latest/index.html
Benchmarks
https://github.com/dan0nchik/SAP-HANA-AutoML/blob/main/comparison_openml.ipynb
ML tasks:
- Binary classification
- Regression
- Multiclass classification
- Forecasting
Steps automated:
- Data exploration
- Data preparation
- Feature engineering
- Model selection
- Model training
- Hyperparameter tuning
Clients
- GUI (Streamlit app)
- Python library
- CLI (coming soon)
Built With
Getting Started
To get a package up and running, follow these simple steps.
Prerequisites
Make sure you have the following:
-
✅ Setup SAP HANA (skip this step if you have an instance with PAL enabled). There are 2 ways to do that.
In HANA Cloud:- Create a free trial account
- Setup an instance
- Enable PAL - Predictive Analysis Library. It is vital to enable it because we use their algorithms.
In Virtual Machine:
-
✅ Installed software
- Python > 3.6
Skip this step ifpython --version
returns > 3.6 - Cython
pip3 install Cython
Installation
There are 2 ways to install the library
- Stable: from pypi
pip3 install hana_automl
- Latest: from the repository
pip3 install https://github.com/dan0nchik/SAP-HANA-AutoML/archive/dev.zip
After installation
Check that PAL (Predictive Analysis Library) is installed and roles are granted
- Read docs section about that.
- If you don't want to read docs, run this code
from hana_automl.utils.scripts import setup_user from hana_ml.dataframe import ConnectionContext cc = ConnectionContext(address='address', user='user', password='password', port=39015) # replace with credentials of user that will be created or granted a role to run PAL. setup_user(connection_context=cc, username='user', password="password")
Usage
From code
Our library in a few lines of code
Connect to database.
from hana_ml.dataframe import ConnectionContext
cc = ConnectionContext(address='address',
user='username',
password='password',
port=1234)
Create AutoML model and fit it.
from hana_automl.automl import AutoML
model = AutoML(cc)
model.fit(
file_path='path to training dataset', # it may be HANA table/view, or pandas DataFrame
steps=10, # number of iterations
target='target', # column to predict
time_limit=120 # time limit in seconds
)
Predict.
model.predict(
file_path='path to test dataset',
id_column='ID',
verbose=1
)
For more examples, please refer to the Documentation
How to run Streamlit client
- Clone repository:
git clone https://github.com/dan0nchik/SAP-HANA-AutoML.git
- Install dependencies:
pip3 install -r requirements.txt
- Run GUI:
streamlit run ./web.py
Roadmap
See the open issues for a list of proposed features (and known issues). Feel free to report any bugs :)
Contributing
Any contributions you make are greatly appreciated
-
Fork the Project
-
Create your Feature Branch (
git checkout -b feature/NewFeature
) -
Install dependencies
pip3 install Cython
pip3 install -r requirements.txt
-
Create
credentials.py
file intests
directory Your files should look like this:SAP-HANA-AutoML │ README.md │ all other files │ ..... | └───tests │ test files... │ credentials.py
Copy and paste this piece of code there and replace it with your credentials:
host = "host" user = "username" password = "password" port = 39015 # or any port you need schema = "your schema"
Don't worry, this file is in .gitignore, so your credentials won't be seen by anyone.
-
Make some changes
-
Write tests that cover your code in
tests
directory -
Run tests (under
SAP-HANA-AutoML directory
)pytest
-
Commit your changes (
git commit -m 'Add some amazing features'
) -
Push to the branch (
git push origin feature/AmazingFeature
) -
Open a Pull Request
License
Distributed under the MIT License. See LICENSE
for more information.
Don't really understand license? Check out the MIT license summary.
Contact
Authors: @While-true-codeanything, @DbusAI, @dan0nchik
Project Link: https://github.com/dan0nchik/SAP-HANA-AutoML