FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

Overview

PyPI version Build Python Version Downloads Join the chat at https://gitter.im/FLAMLer/community

FLAML - Fast and Lightweight AutoML


FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically. It frees users from selecting learners and hyperparameters for each learner. It is fast and economical. The simple and lightweight design makes it easy to extend, such as adding customized learners or metrics. FLAML is powered by a new, cost-effective hyperparameter optimization and learner selection method invented by Microsoft Research. FLAML leverages the structure of the search space to choose a search order optimized for both cost and error. For example, the system tends to propose cheap configurations at the beginning stage of the search, but quickly moves to configurations with high model complexity and large sample size when needed in the later stage of the search. For another example, it favors cheap learners in the beginning but penalizes them later if the error improvement is slow. The cost-bounded search and cost-based prioritization make a big difference in the search efficiency under budget constraints.

FLAML has a .NET implementation as well from ML.NET Model Builder. This ML.NET blog describes the improvement brought by FLAML.

Installation

FLAML requires Python version >= 3.6. It can be installed from pip:

pip install flaml

To run the notebook example, install flaml with the [notebook] option:

pip install flaml[notebook]

Quickstart

  • With three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
  • You can restrict the learners and use FLAML as a fast hyperparameter tuning tool for XGBoost, LightGBM, Random Forest etc. or a customized learner.
automl.fit(X_train, y_train, task="classification", estimator_list=["lgbm"])
  • You can also run generic ray-tune style hyperparameter tuning for a custom function.
from flaml import tune
tune.run(train_with_config, config={…}, low_cost_partial_config={…}, time_budget_s=3600)

Advantages

  • For common machine learning tasks like classification and regression, find quality models with small computational resources.
  • Users can choose their desired customizability: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), full customization (arbitrary training and evaluation code).
  • Allow human guidance in hyperparameter tuning to respect prior on certain subspaces but also able to explore other subspaces. Read more about the hyperparameter optimization methods in FLAML here. They can be used beyond the AutoML context. And they can be used in distributed HPO frameworks such as ray tune or nni.
  • Support online AutoML: automatic hyperparameter tuning for online learning algorithms. Read more about the online AutoML method in FLAML here.

Examples

  • A basic classification example.
from flaml import AutoML
from sklearn.datasets import load_iris
# Initialize an AutoML instance
automl = AutoML()
# Specify automl goal and constraint
automl_settings = {
    "time_budget": 10,  # in seconds
    "metric": 'accuracy',
    "task": 'classification',
    "log_file_name": "test/iris.log",
}
X_train, y_train = load_iris(return_X_y=True)
# Train with labeled input data
automl.fit(X_train=X_train, y_train=y_train,
           **automl_settings)
# Predict
print(automl.predict_proba(X_train))
# Export the best model
print(automl.model)
  • A basic regression example.
from flaml import AutoML
from sklearn.datasets import load_boston
# Initialize an AutoML instance
automl = AutoML()
# Specify automl goal and constraint
automl_settings = {
    "time_budget": 10,  # in seconds
    "metric": 'r2',
    "task": 'regression',
    "log_file_name": "test/boston.log",
}
X_train, y_train = load_boston(return_X_y=True)
# Train with labeled input data
automl.fit(X_train=X_train, y_train=y_train,
           **automl_settings)
# Predict
print(automl.predict(X_train))
# Export the best model
print(automl.model)
  • Time series forecasting.
# pip install flaml[forecast]
import numpy as np
from flaml import AutoML
X_train = np.arange('2014-01', '2021-01', dtype='datetime64[M]')
y_train = np.random.random(size=72)
automl = AutoML()
automl.fit(X_train=X_train[:72],  # a single column of timestamp
           y_train=y_train,  # value for each timestamp
           period=12,  # time horizon to forecast, e.g., 12 months
           task='forecast', time_budget=15,  # time budget in seconds
           log_file_name="test/forecast.log",
          )
print(automl.predict(X_train[72:]))
  • Learning to rank.
from sklearn.datasets import fetch_openml
from flaml import AutoML
X, y = fetch_openml(name="credit-g", return_X_y=True)   
# not a real learning to rank dataaset
groups = [200] * 4 + [100] * 2,    # group counts
automl = AutoML()
automl.fit(
    X_train, y_train, groups=groups,
    task='rank', time_budget=10,    # in seconds
)

More examples can be found in notebooks.

Documentation

Please find the API documentation here.

Please find demo and tutorials of FLAML here.

For more technical details, please check our papers.

@inproceedings{wang2021flaml,
    title={FLAML: A Fast and Lightweight AutoML Library},
    author={Chi Wang and Qingyun Wu and Markus Weimer and Erkang Zhu},
    year={2021},
    booktitle={MLSys},
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

If you are new to GitHub here is a detailed help source on getting involved with development on GitHub.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Developing

Setup

git clone https://github.com/microsoft/FLAML.git
pip install -e .[test,notebook]

Coverage

Any code you commit should generally not significantly impact coverage. To run all unit tests:

coverage run -m pytest test

Then you can see the coverage report by coverage report -m or coverage html. If all the tests are passed, please also test run notebook/flaml_automl to make sure your commit does not break the notebook example.

Authors

  • Chi Wang
  • Qingyun Wu

Contributors (alphabetical order): Amir Aghaei, Vijay Aski, Sebastien Bubeck, Surajit Chaudhuri, Nadiia Chepurko, Ofer Dekel, Alex Deng, Anshuman Dutt, Nicolo Fusi, Jianfeng Gao, Johannes Gehrke, Niklas Gustafsson, Silu Huang, Dongwoo Kim, Christian Konig, John Langford, Menghao Li, Mingqin Li, Zhe Liu, Naveen Gaur, Paul Mineiro, Vivek Narasayya, Jake Radzikowski, Marco Rossi, Amin Saied, Neil Tenenholtz, Olga Vrousgou, Markus Weimer, Yue Wang, Qingyun Wu, Qiufeng Yin, Haozhe Zhang, Minjia Zhang, XiaoYun Zhang, Eric Zhu, and open-source contributors.

License

MIT License

Comments
  • Feature Request : Make FLAML installable with Conda

    Feature Request : Make FLAML installable with Conda

    Basically the title.

    Gherkin style :

    • AS A ML developer
    • WHEN I run conda install flaml (or conda -c conda-forge install flaml)
    • THEN it installs FLAML in my current Conda environment
      • AND I can execute the following code :
    from flaml import AutoML
    from sklearn.datasets import load_iris
    # Initialize an AutoML instance
    automl = AutoML()
    # Specify automl goal and constraint
    automl_settings = {
        "time_budget": 10,  # in seconds
        "metric": 'accuracy',
        "task": 'classification',
        "log_file_name": "test/iris.log",
    }
    X_train, y_train = load_iris(return_X_y=True)
    # Train with labeled input data
    automl.fit(X_train=X_train, y_train=y_train,
               **automl_settings)
    # Predict
    print(automl.predict_proba(X_train))
    # Export the best model
    print(automl.model)
    
    
    help wanted 
    opened by fleuryc 35
  • Kernel panic / Segmentation fault when trying to run example

    Kernel panic / Segmentation fault when trying to run example

    Hi! I tried to install and run a simple example, but ran into a Segmentation Fault: 11 error when using Python 3.9.6:

    
    from sklearn import datasets
    
    def X_y():
        X, y = datasets.make_classification(n_samples=100, n_features=20,
                                            n_informative=2, n_redundant=2, random_state=0)
    
        return X, y
    
    from flaml import AutoML
    automl = AutoML()
    automl.fit(X, y, task="classification")
    

    Running this in both my Jupyter notebook and via bash gives me:

    $ python3 innodays_flaml.py
    [flaml.automl: 09-24 10:45:44] {1431} INFO - Evaluation method: cv
    [flaml.automl: 09-24 10:45:44] {1477} INFO - Minimizing error metric: 1-accuracy
    [flaml.automl: 09-24 10:45:44] {1514} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'catboost', 'xgboost', 'extra_tree', 'lrl1']
    [flaml.automl: 09-24 10:45:44] {1746} INFO - iteration 0, current learner lgbm
    Segmentation fault: 11
    
    opened by angela97lin 26
  • A few questions on running FLAML distributedly via Ray on compute clusters in AzureML

    A few questions on running FLAML distributedly via Ray on compute clusters in AzureML

    Hi @sonichi:

    My team are running FLAML distributedly using Ray on compute clusters in AzureML. We have a few questions since we've never used FLAML in this environment before and hope you could provide some insights.

    1. With this setting, what is the best way to log and register the optimal model returned of each learner in AzureML using mlflow? Shall we simply do mlflow.sklearn.log_model(automl.best_model_for_estimator('LearnerA'), "BestModelLearnerA") and then mlflow.register_model(model_uri=f"{run.info.artifact_uri}/LearnerA", name='flaml-LearnerA') ?

    2. Where is the log file and how to change the directory for it?

    Thank you!

    opened by flippercy 19
  • Why not use early_stop_rounds?

    Why not use early_stop_rounds?

    For LGBM and XGBoost, num_estimators is sampled between 4 and min(32768, int(data_size)).

    Instead, have you considered setting num_estimators (alias num_boost_round) to a large value (say 32768) and using evals and early_stop_rounds during training? This would allow the learning algorithm (rather than the tuning algorithm) to directly find the number of boosting rounds that is optimal, given the data and all other hyper parameter values. (The algorithms will output the best num_estimators/num_boost_round via best_iteration.)

    And then, when you refit on the full dataset, you can use the (average/mode/etc) best_iteration that were used during CV.

    opened by stepthom 14
  • The result of demo case  between ray tune and flaml is not same.

    The result of demo case between ray tune and flaml is not same.

    I run case studies in page of https://github.com/microsoft/FLAML/tree/main/flaml/tune, which includes the cfo used for flmal.tune and ray.tune. But the result is different between them, so is there something wrong in embding the flmal into ray.tune's framework?

    opened by zuoxiaojiang 13
  • reproducibility and random state for AutoML.fit()

    reproducibility and random state for AutoML.fit()

    Hello, I wonder if it is possible to reproduce the results of "flaml.AutoML.fit()"? If possible, could you please kindly let me know how to set up the random_state (or seed) for the "flaml.AutoML.fit()"? Thanks!

    opened by zzheng93 13
  • Using Scikit-learn APIs directly

    Using Scikit-learn APIs directly

    Almost yesterday, I had a short conversation with @sonichi about this, and in general it is better to provide such features more easily to the users... Maybe you (FLAML maintainers) don't have any contest, but you should have features so that more developers will use your product...

    Anyway, the files I imported from them:

    • Pre-processing: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/preprocessing/init.py
    • Model selection: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/model_selection/init.py
    • Metrics: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/metrics/init.py

    I don't think that there is a need to write a test, but we should be careful that a method (class and/or function) is not deprecated...

    A quick example that shows how to use these APIs:

    import pandas as pd
    from flaml.utils import preprocessing
    from flaml.utils import model_selection
    from flaml.utils import metrics
    from flaml import AutoML
    
    # Loading data (nothing changed)
    df = pd.read_csv('<a_random_dataset_that_needs_preprocessing.csv>')
    X = df[['field_no1', 'field_no2', 'field_no3', 'field_no4']]
    y = df['field_no5']
    
    # Preprocessing
    le = preprocessing.LabelEncoder()
    X['field_no3'] = le.fit_transform(X['field_no3'])
    y['field_no5'] = le.fit_transform(X['field_no5'])
    
    # Seperating the train and test data
    X_train, y_train, X_test, y_test = model_selection.train_test_split(X, y, test_size=.2)
    
    # Training phase (nothing changed)
    automl = AutoML()
    automl.fit(X_train, y_train, task='classification')
    
    # Measuring accuracy
    y_pred = automl.predict(X_test)
    print(metrics.classification_report(y_test, y_pred))
    

    Or:

    from flaml.utils import (
        LabelEncoder,
        train_test_split,
        classification_report,
    )
    from flaml import AutoML
    

    Or even:

    from flaml import (
        LabelEncoder,
        train_test_split,
        classification_report,
        AutoML,
    )
    
    opened by sheikhartin 11
  • Crash with ValueError when ensemble=True

    Crash with ValueError when ensemble=True

    When I set ensemble=True, and my data has categorical features, I get the following error at the end of the FLAML run:

    [flaml.automl: 07-08 09:40:44] {1141} INFO -  at 9373.5s,       best extra_tree's error=0.2056, best rf's error=0.1950[flaml.automl: 07-08 09:40:44] {993} INFO - iteration 52, current learner rf[flaml.automl: 07-08 09:41:42] {1141} INFO -  at 9431.7s,       best rf's error=0.1950, best rf's error=0.1950
    [flaml.automl: 07-08 09:41:42] {993} INFO - iteration 53, current learner rf
    [flaml.automl: 07-08 09:42:11] {1141} INFO -  at 9460.7s,       best rf's error=0.1950, best rf's error=0.1950[flaml.automl: 07-08 09:42:11] {993} INFO - iteration 54, current learner rf[flaml.automl: 07-08 09:50:15] {1141} INFO -  at 9944.4s,       best rf's error=0.1949, best rf's error=0.1949
    [flaml.automl: 07-08 09:50:15] {1187} INFO - selected model: RandomForestClassifier(criterion='entropy', max_features=0.7294599478674504,
                           n_estimators=347, n_jobs=10)[flaml.automl: 07-08 09:50:15] {1197} INFO - [('rf', <flaml.model.RandomForestEstimator object at 0x7fca69effaf0>), ('extra_tree', <flaml.model.ExtraTreeEstimator object at 0x7fca8cc1f8e0>), ('lgbm', <flaml.model.LGBMEstimator object at 0x7fc799985190>), ('catboost', <flaml.model.CatBoostEstimator object at 0x7fc
    a8cc884f0>), ('xgboost', <flaml.model.XGBoostSklearnEstimator object at 0x7fca8cd0e610>)]
    /global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/xgboost/sklearn.py:888: UserWarning: The use of label encoder in XGBClassifier is deprecat
    ed and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier
    object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
      warnings.warn(label_encoder_deprecation_msg, UserWarning)
    /global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/xgboost/sklearn.py:888: UserWarning: The use of label encoder in XGBClassifier is deprecat
    ed and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier
    object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
      warnings.warn(label_encoder_deprecation_msg, UserWarning)
    Traceback (most recent call last):  File "search.py", line 212, in <module>    dump_json(data_sheet_file, data_sheet)
      File "search.py", line 208, in main
        with open(data_sheet_file) as f:  File "search.py", line 163, in run_data_sheet    run['flaml_settings'] = jsonpickle.encode(automl_settings, unpicklable=False, keys=True)
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/flaml/automl.py", line 943, in fit
        self._search()  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/flaml/automl.py", line 1212, in _search    stacker.fit(self._X_train_all, self._y_train_all,
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 441, in fit
        return super().fit(X, self._le.transform(y), sample_weight)  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 196, in fit    _fit_single_estimator(self.final_estimator_, X_meta, y,
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_base.py", line 39, in _fit_single_estimator
        estimator.fit(X, y)
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/flaml/model.py", line 296, in fit
        self._fit(X_train, y_train, **kwargs)
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/flaml/model.py", line 78, in _fit
        model.fit(X_train, y_train, **kwargs)
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 304, in fit
        X, y = self._validate_data(X, y, multi_output=True,
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/base.py", line 433, in _validate_data
        X, y = check_X_y(X, y, **check_params)
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
        return f(*args, **kwargs)
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 871, in check_X_y
        X = check_array(X, accept_sparse=accept_sparse,
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
        return f(*args, **kwargs)
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 673, in check_array
        array = np.asarray(array, order=order, dtype=dtype)
      File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/numpy/core/_asarray.py", line 83, in asarray
        return array(a, dtype, copy=False, order=order)
    ValueError: could not convert string to float: '__OTHER__'
    

    This error does not occur if ensemble=False or if I remove (or encode) the categorical features from my dataset

    My guess is that FLAML properly encodes categorical features when training the base estimators (LGBM, RF, etc), but not when training the stacking classifier.

    opened by stepthom 11
  • Catboost not respecting custom_hp

    Catboost not respecting custom_hp

    Hi,

    I have configured a custom_hp for catboost:

    custom_hp = {
        "catboost": {
            'n_estimators': {
                "domain": tune.randint(lower=1000, upper=3000),
                "init_value": 2001,
                "low_cost_init_value": 2000,
            },
            'learning_rate': {
                "domain": tune.uniform(lower=0.1, upper=1.0),
                "init_value": 0.01,
                "low_cost_init_value": .1,
            },
            'colsample_bylevel': {
                "domain": tune.uniform(lower=0.1, upper=1.0),
                "init_value": 0.1,
            },
            'depth': {
                "domain": tune.randint(lower=1, upper=12),
                "init_value": 2,
                "low_cost_init_value": 2,
            },
            'l2_leaf_reg': {
                "domain": tune.uniform(lower=0.1, upper=20),
                "init_value": 3.0,
                "low_cost_init_value": 1.0,
            },
            'bootstrap_type': {
                "domain": tune.choice(['Bayesian', 'Bernoulli', 'MVS']),
                "init_value": "Bayesian",
                "low_cost_init_value": "Bayesian",
            },
            'grow_policy': {
                "domain": tune.choice(['Lossguide', 'Depthwise']),
                "init_value": "Lossguide",
                "low_cost_init_value": "Lossguide",
            },
        },
    }
    

    The first iteration of the model fit the self.params for n_estimators is 2001 as defined by init_value. However, subsequent iterations of the fit the n_estimators is a very low value such as 3. Initially I thought it was because my low_cost_init_value was low set at 2 but as you can see I set it to the minimun for the domain as 2000.

    Here is screen shot of first iteration where self.params are consistent with custom_hp: image

    On second iteration onward, the n_estimators are very low and outside of the custom_hp domain: image

    And here is the log after first iteration:

    {"record_id": 0, "iter_per_learner": 1, "logged_metric": {"sharpe": -0.00794463007228285, "correlation": -0.00015975141980052537}, "trial_time": 95.14761304855347, "wall_clock_time": 95.15061020851135, "validation_loss": 0.00794463007228285, "config": {"early_stopping_rounds": 10, "n_estimators": 1, "colsample_bylevel": 0.1, "depth": 2, "l2_leaf_reg": 2.9999999999999996, "bootstrap_type": "Bayesian", "grow_policy": "Lossguide", "learning_rate": 0.42154280932622956}, "learner": "catboost", "sample_size": 70417}

    opened by jmrichardson 10
  • Unable to work with FLAML in Kaggle

    Unable to work with FLAML in Kaggle

    Error : IImportError Traceback (most recent call last) /tmp/ipykernel_33/3768597154.py in ----> 1 from flaml import AutoML 2 automl = AutoML()

    /opt/conda/lib/python3.7/site-packages/flaml/init.py in 1 from flaml.searcher import CFO, BlendSearch, FLOW2, BlendSearchTuner ----> 2 from flaml.automl import AutoML, logger_formatter 3 from flaml.onlineml.autovw import AutoVW 4 from flaml.version import version 5 import logging

    /opt/conda/lib/python3.7/site-packages/flaml/automl.py in 22 import logging 23 import json ---> 24 from .ml import ( 25 compute_estimator, 26 train_estimator,

    /opt/conda/lib/python3.7/site-packages/flaml/ml.py in 6 import numpy as np 7 import pandas as pd ----> 8 from sklearn.metrics import ( 9 mean_squared_error, 10 r2_score,

    ImportError: cannot import name 'mean_absolute_percentage_error' from 'sklearn.metrics' (/opt/conda/lib/python3.7/site-packages/sklearn/metrics/init.py)

    opened by GDGauravDutta 10
  • Do you expect FLAML to run across multiple nodes in AML using RAY

    Do you expect FLAML to run across multiple nodes in AML using RAY

    On your page https://microsoft.github.io/FLAML/docs/Examples/Integrate%20-%20AzureML/#use-ray-to-distribute-across-a-cluster you describe the process of configuring an AML cluster with Ray and using FLAML against it.

    Is it your expectation that this configuration, as you have it, will be distributed Ray or parallel Ray on a single node?

    I was under the impression you needed a compute cluster with a VNET to allow a Ray cluster?

    Also I ran your sample and modified it slightly to log the node id of the machine executing (I did this via a custom metric so it ran for each iteration) and it always only logged the same node that my flaml script is executing on even though I have 2 nodes available.

    opened by camer314 10
  • New tuning API in ray 2

    New tuning API in ray 2

    There seems to be a big change in ray tune API: https://docs.ray.io/en/latest/tune/api_docs/execution.html#tuner How does it affect flaml when using ray 2 as the backend?

    @Yard1 your insight would be appreciated.

    opened by sonichi 1
  • fix #871: call check_spark only when necessary

    fix #871: call check_spark only when necessary

    Why are these changes needed?

    Currently check_spark will be called when use_spark=False and n_concurrent_trials=1 . Which creates unnecessary spark session. With this PR, only when use_spark=True or (n_concurrent_trials>1 and ray is not available) will check_spark be called.

    Related issue number

    Closes #871

    Checks

    • [x] I've used pre-commit to lint the changes in this PR, or I've made sure lint with flake8 output is two 0s.
    • [ ] I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
    • [ ] I've added tests (if relevant) corresponding to the changes introduced in this PR.
    • [x] I've made sure all auto checks have passed.
    opened by thinkall 4
  • `AutoML.fit` always creates a spark session even when it doesn't need to

    `AutoML.fit` always creates a spark session even when it doesn't need to

    Currently, AutoML.fit always creates a spark session even when it doesn't need to in the following line: https://github.com/microsoft/FLAML/blob/90aea9c28b6100faf86f2c204e53a68e87f10c66/flaml/automl/automl.py#L2607

    check_spark calls SparkSession.builder.getOrCreate and creates a spark session if it doesn't exist:

    https://github.com/microsoft/FLAML/blob/90aea9c28b6100faf86f2c204e53a68e87f10c66/flaml/tune/spark/utils.py#L48

    I think we can skip check_spark in the following cases:

    1. When use_ray = True
    2. When n_concurrent_trials = 1

    Code to reproduce:

    from flaml import AutoML
    from sklearn.datasets import load_iris
    
    automl = AutoML()
    X, y =load_iris(as_frame=True, return_X_y=True)
    automl.fit(X, y, task="classification")
    

    Output:

    check Spark installation...This line should appear only once.
    
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    23/01/05 14:41:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    [flaml.automl.automl: 01-05 14:41:48] {2712} INFO - task = classification
    [flaml.automl.automl: 01-05 14:41:48] {2714} INFO - Data split method: stratified
    [flaml.automl.automl: 01-05 14:41:48] {2717} INFO - Evaluation method: cv
    [flaml.automl.automl: 01-05 14:41:48] {2844} INFO - Minimizing error metric: log_loss
    ...
    
    opened by harupy 2
  • notebook test

    notebook test

    Why are these changes needed?

    Add tests for some notebooks. Removed warning message about Spark which is unnecessary for non-Spark users.

    Related issue number

    #851

    Checks

    • [x] I've used pre-commit to lint the changes in this PR, or I've made sure lint with flake8 output is two 0s.
    • [x] I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
    • [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
    • [ ] I've made sure all auto checks have passed.
    opened by sonichi 6
  • HistGradientBoosting support

    HistGradientBoosting support

    Right now it could be only added by writing custom class.

    Question is: will there be added support to flamlize_estimator for HistGradientBoostingClassifier and HistGradientBoostingRegressor ?

    opened by glevv 1
Releases(v1.1.0)
  • v1.1.0(Dec 30, 2022)

    Highlights

    • Spark is now supported as a new parallel tuning backend.
    • New tuning capability: targeted tuning with multiple lexicographic objectives. Check out documentation and an example for this new tuning capability.
    • New metrics: roc_auc_weighted, roc_auc_ovr_weighted, roc_auc_ovo_weighted.
    • New reproducible learner selection method when time_budget is not specified.
    • AutoML-related functionaility is moved into a new automl subpackage.

    Thanks to all contributors who contributed to this release!

    What's Changed

    • Bump actions/checkout from 2 to 3 by @dependabot in https://github.com/microsoft/FLAML/pull/699
    • fix dependably alert by @skzhang1 in https://github.com/microsoft/FLAML/pull/818
    • fix typo by @skzhang1 in https://github.com/microsoft/FLAML/pull/823
    • install editable package in codespace by @sonichi in https://github.com/microsoft/FLAML/pull/826
    • skip test_hf_data in py 3.6 by @sonichi in https://github.com/microsoft/FLAML/pull/832
    • fix typo of output directory by @thinkall in https://github.com/microsoft/FLAML/pull/828
    • catch TFT logger bugs by @int-chaos in https://github.com/microsoft/FLAML/pull/833
    • roc_auc_weighted metric addition by @shreyas36 in https://github.com/microsoft/FLAML/pull/827
    • make performance test reproducible by @sonichi in https://github.com/microsoft/FLAML/pull/837
    • Refactor into automl subpackage by @markharley in https://github.com/microsoft/FLAML/pull/809
    • Edit the announcement of AAAI-23 tutorial and the KDD tutorial announcement. by @HangHouCheong in https://github.com/microsoft/FLAML/pull/820
    • Use get to avoid KeyError by @sonichi in https://github.com/microsoft/FLAML/pull/824
    • Update doc by @skzhang1 in https://github.com/microsoft/FLAML/pull/843
    • fix bug related to choice by @sonichi in https://github.com/microsoft/FLAML/pull/848
    • FAQ about OOM by @sonichi in https://github.com/microsoft/FLAML/pull/849
    • Update .NET documentation links by @luisquintanilla in https://github.com/microsoft/FLAML/pull/847
    • Added an info reminding user that if no time_budget and no max_iter is specified, then effectively zero-shot AutoML is used by @jingdong00 in https://github.com/microsoft/FLAML/pull/850
    • Fix example tune-pytorch where the checkpoint path may be named differently by @jingdong00 in https://github.com/microsoft/FLAML/pull/853
    • Format errors on the web. by @skzhang1 in https://github.com/microsoft/FLAML/pull/855
    • Add supporting using Spark as the backend of parallel training by @thinkall in https://github.com/microsoft/FLAML/pull/846
    • Info and naming by @sonichi in https://github.com/microsoft/FLAML/pull/864

    New Contributors

    • @thinkall made their first contribution in https://github.com/microsoft/FLAML/pull/828
    • @markharley made their first contribution in https://github.com/microsoft/FLAML/pull/809
    • @HangHouCheong made their first contribution in https://github.com/microsoft/FLAML/pull/820

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.14...v1.1.0

    Source code(tar.gz)
    Source code(zip)
  • v1.0.14(Nov 16, 2022)

    Highlights

    • Preparing alpha release of multi-objective hyperparameter tuning with lexicographic preference.
    • Fixed issues related to zero-shot automl.
    • Multiple improvements to documentation.

    What's Changed

    • Discord Badge Added by @royninja in https://github.com/microsoft/FLAML/pull/760
    • fix bug in current nlp documentation by @liususan091219 in https://github.com/microsoft/FLAML/pull/763
    • Multiple objectives hyperparameter tuning with lexicographic preference by @Anonymous-submission-repo in https://github.com/microsoft/FLAML/pull/752
    • Indentation corrected by @Kirito-Excalibur in https://github.com/microsoft/FLAML/pull/778
    • Included hint to escape brackets for pip setup by @evensure in https://github.com/microsoft/FLAML/pull/786
    • Docs by @velezbeltran in https://github.com/microsoft/FLAML/pull/765
    • Bump actions/setup-python from 2 to 4 by @dependabot in https://github.com/microsoft/FLAML/pull/700
    • Bump codecov/codecov-action from 1 to 3 by @dependabot in https://github.com/microsoft/FLAML/pull/697
    • Removed extra | in documentation by @satya-vinay in https://github.com/microsoft/FLAML/pull/790
    • fix_alert by @skzhang1 in https://github.com/microsoft/FLAML/pull/793
    • Fixed typo by @ElinaAndreeva in https://github.com/microsoft/FLAML/pull/797
    • fix_alerts by @skzhang1 in https://github.com/microsoft/FLAML/pull/799
    • Documentation about classification/regression task #753 by @royninja in https://github.com/microsoft/FLAML/pull/802
    • Added a link to documentation webpage in notebook time_series_forcast by @jingdong00 in https://github.com/microsoft/FLAML/pull/791
    • Fix issues related to zero-shot automl by @sonichi in https://github.com/microsoft/FLAML/pull/783
    • added the models used for forecasting in documentation by @shreyas36 in https://github.com/microsoft/FLAML/pull/811
    • Add performance test for LexiFlow by @Anonymous-submission-repo in https://github.com/microsoft/FLAML/pull/812

    New Contributors

    • @royninja made their first contribution in https://github.com/microsoft/FLAML/pull/760
    • @Anonymous-submission-repo made their first contribution in https://github.com/microsoft/FLAML/pull/752
    • @Kirito-Excalibur made their first contribution in https://github.com/microsoft/FLAML/pull/778
    • @evensure made their first contribution in https://github.com/microsoft/FLAML/pull/786
    • @velezbeltran made their first contribution in https://github.com/microsoft/FLAML/pull/765
    • @satya-vinay made their first contribution in https://github.com/microsoft/FLAML/pull/790
    • @ElinaAndreeva made their first contribution in https://github.com/microsoft/FLAML/pull/797
    • @jingdong00 made their first contribution in https://github.com/microsoft/FLAML/pull/791
    • @shreyas36 made their first contribution in https://github.com/microsoft/FLAML/pull/811

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.13...v1.0.14

    Source code(tar.gz)
    Source code(zip)
  • v1.0.13(Oct 13, 2022)

    Highlights

    • Logging the search_state.config directly to MLflow instead of key-dictionary pair
    • Move searcher and scheduler into tune
    • Move import location for Ray 2
    • Fix NLP dimension mismatch bug

    What's Changed

    • Dockerfile building problem by @skzhang1 in https://github.com/microsoft/FLAML/pull/719
    • Update Contribute.md by @vijaya-lakshmi-venkatraman in https://github.com/microsoft/FLAML/pull/716
    • Move import location for Ray 2 by @sonichi in https://github.com/microsoft/FLAML/pull/721
    • Fix issue 728 add hyperlink to GitHub location by @Libens-bufo in https://github.com/microsoft/FLAML/pull/731
    • Update model.py by @vijaya-lakshmi-venkatraman in https://github.com/microsoft/FLAML/pull/739
    • Issue724 by @liususan091219 in https://github.com/microsoft/FLAML/pull/745
    • log search_state.config directly instead of under tag config by @prithvikannan in https://github.com/microsoft/FLAML/pull/747
    • move searcher and scheduler into tune by @sonichi in https://github.com/microsoft/FLAML/pull/746
    • updating the data collator for seq-regression to handle the dim mismatch problem by @liususan091219 in https://github.com/microsoft/FLAML/pull/751
    • Update Contribute by @sonichi in https://github.com/microsoft/FLAML/pull/741
    • Remove NLP classification head by @liususan091219 in https://github.com/microsoft/FLAML/pull/756

    New Contributors

    • @vijaya-lakshmi-venkatraman made their first contribution in https://github.com/microsoft/FLAML/pull/716
    • @Libens-bufo made their first contribution in https://github.com/microsoft/FLAML/pull/731
    • @prithvikannan made their first contribution in https://github.com/microsoft/FLAML/pull/747

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.12...v1.0.13

    Source code(tar.gz)
    Source code(zip)
  • v1.0.12(Sep 6, 2022)

    Highlights

    • Fix MLFlow bug to support the case where search.state.metric_for_logging is None
    • Support customized cross-validation strategy
    • Fix SARIMAX seasonal_order parameter name in the wrapper

    Thanks to all the contributors for this release!

    What's Changed

    • chore: Auto update github actions with dependabot by @iemejia in https://github.com/microsoft/FLAML/pull/688
    • talks and tutorials by @qingyun-wu in https://github.com/microsoft/FLAML/pull/694
    • updating nlp notebook by @liususan091219 in https://github.com/microsoft/FLAML/pull/693
    • "intermediate_results" TypeError: argument of type 'NoneType' is not iterable by @liususan091219 in https://github.com/microsoft/FLAML/pull/695
    • Update Research.md by @sonichi in https://github.com/microsoft/FLAML/pull/701
    • Bump actions/setup-node from 2 to 3 by @dependabot in https://github.com/microsoft/FLAML/pull/698
    • Bump actions/cache from 1 to 3 by @dependabot in https://github.com/microsoft/FLAML/pull/696
    • Support customized cross-validation strategy by @skzhang1 in https://github.com/microsoft/FLAML/pull/669
    • Add $schema to cgmanifest.json by @JamieMagee in https://github.com/microsoft/FLAML/pull/708
    • Fix SARIMAX seasonal_order parameter name in the wrapper by @EgorKraevTransferwise in https://github.com/microsoft/FLAML/pull/711

    New Contributors

    • @iemejia made their first contribution in https://github.com/microsoft/FLAML/pull/688
    • @JamieMagee made their first contribution in https://github.com/microsoft/FLAML/pull/708
    • @EgorKraevTransferwise made their first contribution in https://github.com/microsoft/FLAML/pull/711

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.11...v1.0.12

    Source code(tar.gz)
    Source code(zip)
  • v1.0.11(Aug 21, 2022)

    Highlights

    • Preserve the checkpoint when deleting AutoML objects.
    • Create no eval set when setting use_best_model to False for catboost.

    What's Changed

    • add guideline collection by @qingyun-wu in https://github.com/microsoft/FLAML/pull/687
    • LightGBM notebook update by @sonichi in https://github.com/microsoft/FLAML/pull/690
    • Add preserve_checkpoint to preserve the checkpoint after del by @liususan091219 in https://github.com/microsoft/FLAML/pull/692
    • use_best_model for catboost by @sonichi in https://github.com/microsoft/FLAML/pull/679

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.10...v1.0.11

    Source code(tar.gz)
    Source code(zip)
  • v1.0.10(Aug 16, 2022)

    This release contains several new features to highlight:

    • A major new feature is to support multiple time series in one dataset with a new task named "ts_forecast_panel" and a neural network estimator from pytorch-forecast.
    • Allow disabling shuffle for custom splitter.
    • Allow explicit specification of whether the choices of a hp have an inherent order.
    • Allow skipping data transformation to avoid overhead.
    • Support AzureML pipeline tuning.
    • Allow log file name to be specified in tune.run and perform logging when ray is used.

    There are other improvements for the transformer estimator and bug fixes for config constraints.

    What's Changed

    • Fixing the issue that FLAML trial number is significantly smaller than Transformers.hyperparameter_search by @liususan091219 in https://github.com/microsoft/FLAML/pull/657
    • make test result more stable by @sonichi in https://github.com/microsoft/FLAML/pull/646
    • Add pipeline tuner component and dependencies. by @ruizhuanguw in https://github.com/microsoft/FLAML/pull/671
    • Skip transform by @jmrichardson in https://github.com/microsoft/FLAML/pull/665
    • pull request template by @sonichi in https://github.com/microsoft/FLAML/pull/668
    • Update Research.md by @liususan091219 in https://github.com/microsoft/FLAML/pull/672
    • Documentation on search space and parallel/sequential tuning by @qingyun-wu in https://github.com/microsoft/FLAML/pull/675
    • time series forecasting with panel datasets by @int-chaos in https://github.com/microsoft/FLAML/pull/541
    • categorical choice can be ordered or unordered by @sonichi in https://github.com/microsoft/FLAML/pull/677
    • Disable shuffle for custom CV by @jmrichardson in https://github.com/microsoft/FLAML/pull/659
    • update time series forecast notebook by @int-chaos in https://github.com/microsoft/FLAML/pull/682
    • check config constraints for the initial config by @sonichi in https://github.com/microsoft/FLAML/pull/685
    • log_file_name in tune.run() by @sonichi in https://github.com/microsoft/FLAML/pull/681
    • updating nlp notebook by @liususan091219 in https://github.com/microsoft/FLAML/pull/683
    • VW version requirement and documentation on config_constraints vs metric_constraints by @qingyun-wu in https://github.com/microsoft/FLAML/pull/686

    New Contributors

    • @jmrichardson made their first contribution in https://github.com/microsoft/FLAML/pull/665

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.9...v1.0.10

    Source code(tar.gz)
    Source code(zip)
  • v1.0.9(Jul 31, 2022)

    Highlight

    • Add the feature names and importance in AutoML
    • Update NLP search space and fix several bugs in NLP tasks
    • Respect kwargs in AutoML.predict()

    What's Changed

    • Feature names and importances by @sonichi in https://github.com/microsoft/FLAML/pull/621
    • fix NER roberta bug by @liususan091219 in https://github.com/microsoft/FLAML/pull/632
    • updating search space by @liususan091219 in https://github.com/microsoft/FLAML/pull/633
    • Bump terser from 5.10.0 to 5.14.2 in /website by @dependabot in https://github.com/microsoft/FLAML/pull/642
    • This PR fixes the frequent NLP bugs in the other PRs by @liususan091219 in https://github.com/microsoft/FLAML/pull/647
    • added "**kwargs" to "predict" by @zzheng93 in https://github.com/microsoft/FLAML/pull/641
    • Fix alerts by @skzhang1 in https://github.com/microsoft/FLAML/pull/644
    • Update .NET documentation by @luisquintanilla in https://github.com/microsoft/FLAML/pull/643
    • Fix HPO evaluation bug by @liususan091219 in https://github.com/microsoft/FLAML/pull/645

    New Contributors

    • @dependabot made their first contribution in https://github.com/microsoft/FLAML/pull/642
    • @zzheng93 made their first contribution in https://github.com/microsoft/FLAML/pull/641
    • @luisquintanilla made their first contribution in https://github.com/microsoft/FLAML/pull/643

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.8...v1.0.9

    Source code(tar.gz)
    Source code(zip)
  • v1.0.8(Jul 10, 2022)

    • Support latest xgboost version
    • Reproducibility improvement for blendsearch
    • Allow custom GroupKFold object as split_type
    • Bug fix in token classification tasks such as NER
    • Allow FLAML_sample_size in starting_points

    What's Changed

    • log msg about ensemble by @sonichi in https://github.com/microsoft/FLAML/pull/597
    • support latest xgboost version by @sonichi in https://github.com/microsoft/FLAML/pull/599
    • Fix automl settings in scikit-learn pipeline integration example by @ZviBaratz in https://github.com/microsoft/FLAML/pull/602
    • update got version by @sonichi in https://github.com/microsoft/FLAML/pull/607
    • min eci depends on cost_attr; cost_attr in ls by @sonichi in https://github.com/microsoft/FLAML/pull/612
    • Replaced !pip calls with %pip magic command by @ZviBaratz in https://github.com/microsoft/FLAML/pull/604
    • cath URLError by @sonichi in https://github.com/microsoft/FLAML/pull/613
    • Updated pre-commit hooks by @ZviBaratz in https://github.com/microsoft/FLAML/pull/609
    • Py36 by @sonichi in https://github.com/microsoft/FLAML/pull/614
    • Allow custom GroupKFold object as split_type by @sonichi in https://github.com/microsoft/FLAML/pull/616
    • Typo fix by @ZviBaratz in https://github.com/microsoft/FLAML/pull/618
    • use relative url in doc by @sonichi in https://github.com/microsoft/FLAML/pull/620
    • This PR will solve issue, code example format in the doc #622 by @31Sanskrati in https://github.com/microsoft/FLAML/pull/623
    • fix ner bug; refactor post processing of TransformersEstimator prediction by @liususan091219 in https://github.com/microsoft/FLAML/pull/615
    • isinstance(x, int) -> isinstance(x, (int, np.integer)) by @liususan091219 in https://github.com/microsoft/FLAML/pull/627
    • Allow FLAML_sample_size in starting_points by @qingyun-wu in https://github.com/microsoft/FLAML/pull/619
    • disable max_len for ner by @liususan091219 in https://github.com/microsoft/FLAML/pull/629
    • fix #630 by @adi611 in https://github.com/microsoft/FLAML/pull/631

    New Contributors

    • @ZviBaratz made their first contribution in https://github.com/microsoft/FLAML/pull/602
    • @31Sanskrati made their first contribution in https://github.com/microsoft/FLAML/pull/623
    • @adi611 made their first contribution in https://github.com/microsoft/FLAML/pull/631

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.7...v1.0.8

    Source code(tar.gz)
    Source code(zip)
  • v1.0.7(Jun 17, 2022)

    • Add support of Python 3.10.
    • Enable ensemble when using ray.
    • Enable nested tuning runs.
    • Made BlendSearch reproducible when constructed outside tune.run().
    • Fix resource limit issue in some macos version.
    • Bug fix in nlp.
    • Make set_search_properties() compatible with ray tune.

    What's Changed

    • enable ensemble when using ray by @sonichi in https://github.com/microsoft/FLAML/pull/583
    • update time from start when using ray by @sonichi in https://github.com/microsoft/FLAML/pull/586
    • Class variables, cost_attr, and reproducibility by @qingyun-wu in https://github.com/microsoft/FLAML/pull/587
    • backup & recover global vars for nested tune.run by @sonichi in https://github.com/microsoft/FLAML/pull/584
    • fixing a bug in nlp/utils.py by @liususan091219 in https://github.com/microsoft/FLAML/pull/590
    • fix resource limit issue by @sonichi in https://github.com/microsoft/FLAML/pull/589
    • Modified setup instructions by @daniel-555 in https://github.com/microsoft/FLAML/pull/593
    • Add python 3.10 in the CI by @sonichi in https://github.com/microsoft/FLAML/pull/591
    • trying to fix the indexerror for ner by @liususan091219 in https://github.com/microsoft/FLAML/pull/596
    • Update documentation for NLP by @liususan091219 in https://github.com/microsoft/FLAML/pull/594
    • set_search_properties by @sonichi in https://github.com/microsoft/FLAML/pull/595

    New Contributors

    • @daniel-555 made their first contribution in https://github.com/microsoft/FLAML/pull/593

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.6...v1.0.7

    Source code(tar.gz)
    Source code(zip)
  • v1.0.6(Jun 9, 2022)

    What's Changed

    • init value type match by @sonichi in https://github.com/microsoft/FLAML/pull/575

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.5...v1.0.6

    Source code(tar.gz)
    Source code(zip)
  • v1.0.5(Jun 7, 2022)

    What's Changed

    • fixing trainable and update function, completing NOTE by @liususan091219 in https://github.com/microsoft/FLAML/pull/566
    • Update fit_kwargs_by_estimator example in Task-Oriented-AutoML.md by @liususan091219 in https://github.com/microsoft/FLAML/pull/561
    • add zeroshot notebook by @sonichi in https://github.com/microsoft/FLAML/pull/569
    • set holiday version <0.14 for prophet by @sonichi in https://github.com/microsoft/FLAML/pull/573
    • Updated doc by @PrajwalBorkar in https://github.com/microsoft/FLAML/pull/572
    • install openml for notebook example by @sonichi in https://github.com/microsoft/FLAML/pull/574

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.4...v1.0.5

    Source code(tar.gz)
    Source code(zip)
  • v1.0.4(Jun 2, 2022)

    What's Changed

    • Update documentation for FAQ about how to handle imbalanced data by @liususan091219 in https://github.com/microsoft/FLAML/pull/560
    • update doc about scheduler exception by @sonichi in https://github.com/microsoft/FLAML/pull/564
    • version update by @sonichi in https://github.com/microsoft/FLAML/pull/567

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.3...v1.0.4

    Source code(tar.gz)
    Source code(zip)
  • v1.0.3(May 31, 2022)

    Data files needed for zero-shot AutoML are included in this release. When no search budget is given via time_budget/max_iter, zero-shot automl is used automatically.

    What's Changed

    • align indent and add missing quotation by @sonichi in https://github.com/microsoft/FLAML/pull/555
    • solve issue #542. fix pickle.UnpickingError while blendsearch warm start by @LinWencong in https://github.com/microsoft/FLAML/pull/554
    • Documentation, test and bugfix by @qingyun-wu in https://github.com/microsoft/FLAML/pull/556
    • Removed cat_hp_cost by @PrajwalBorkar in https://github.com/microsoft/FLAML/pull/559
    • Update Tune-User-Defined-Function.md by @sonichi in https://github.com/microsoft/FLAML/pull/562
    • use zeroshot when no budget is given; custom_hp by @sonichi in https://github.com/microsoft/FLAML/pull/563
    • simplify warmstart in blendsearch by @sonichi in https://github.com/microsoft/FLAML/pull/558
    • include .json file in flaml.default package by @sonichi in https://github.com/microsoft/FLAML/pull/565

    New Contributors

    • @LinWencong made their first contribution in https://github.com/microsoft/FLAML/pull/554
    • @PrajwalBorkar made their first contribution in https://github.com/microsoft/FLAML/pull/559

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.2...v1.0.3

    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(May 20, 2022)

    What's Changed

    • docstr cleanup #523: removed lines 259 to 260 in a1c49ca by @elbowgreasel in https://github.com/microsoft/FLAML/pull/524
    • refactoring TransformersEstimator to support default and custom_hp by @liususan091219 in https://github.com/microsoft/FLAML/pull/511
    • Bump cross-fetch from 3.1.4 to 3.1.5 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/529
    • fixing use_ray in automl.py by @liususan091219 in https://github.com/microsoft/FLAML/pull/531
    • handle non-flaml scheduler in flaml.tune by @qingyun-wu in https://github.com/microsoft/FLAML/pull/532
    • test reproducibility from retrain by @sonichi in https://github.com/microsoft/FLAML/pull/533
    • fix the post-processing bug in NER by @liususan091219 in https://github.com/microsoft/FLAML/pull/534
    • fixing roberta add_prefix_space bug by @liususan091219 in https://github.com/microsoft/FLAML/pull/546
    • choose n_jobs for ensemble according to n_jobs per learner by @sonichi in https://github.com/microsoft/FLAML/pull/551
    • Quick-fix by @Qiaochu-Song in https://github.com/microsoft/FLAML/pull/539
    • fix indentation in automl.py by @harish445 in https://github.com/microsoft/FLAML/pull/553

    New Contributors

    • @elbowgreasel made their first contribution in https://github.com/microsoft/FLAML/pull/524
    • @Qiaochu-Song made their first contribution in https://github.com/microsoft/FLAML/pull/539
    • @harish445 made their first contribution in https://github.com/microsoft/FLAML/pull/553

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.1...v1.0.2

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Apr 24, 2022)

    What's Changed

    • use ffill in forecasting example by @sonichi in https://github.com/microsoft/FLAML/pull/508
    • Handling fractional gpu_per_trial for NLP by @liususan091219 in https://github.com/microsoft/FLAML/pull/513
    • Fix AttributeError: readonly attribute for Python 3.10.4 by @jayshanker2000 in https://github.com/microsoft/FLAML/pull/518
    • max choice is n-1 by @sonichi in https://github.com/microsoft/FLAML/pull/521
    • allow evaluated_rewards shorter than points_to_evaluate by @sonichi in https://github.com/microsoft/FLAML/pull/522

    New Contributors

    • @jayshanker2000 made their first contribution in https://github.com/microsoft/FLAML/pull/518

    Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.0...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Mar 31, 2022)

    What's Changed

    • zero-shot AutoML in readme by @sonichi in https://github.com/microsoft/FLAML/pull/474
    • update documentation for time series forecasting by @int-chaos in https://github.com/microsoft/FLAML/pull/472
    • metric constraints in flaml.automl by @qingyun-wu in https://github.com/microsoft/FLAML/pull/479
    • import from lightgbm by @sonichi in https://github.com/microsoft/FLAML/pull/489
    • fixing bug for ner by @liususan091219 in https://github.com/microsoft/FLAML/pull/463
    • doc update (#490) by @sonichi in https://github.com/microsoft/FLAML/pull/492
    • adding evaluation by @liususan091219 in https://github.com/microsoft/FLAML/pull/495
    • version number and doc by @sonichi in https://github.com/microsoft/FLAML/pull/497
    • fixing a few bugs in nlp by @liususan091219 in https://github.com/microsoft/FLAML/pull/503
    • Bug fix and add documentation for metric_constraints by @qingyun-wu in https://github.com/microsoft/FLAML/pull/498
    • fixing some bug in NLP by @liususan091219 in https://github.com/microsoft/FLAML/pull/506
    • handle failing trials by @sonichi in https://github.com/microsoft/FLAML/pull/505
    • Update notebook and test by @qingyun-wu in https://github.com/microsoft/FLAML/pull/507
    • Bump minimist from 1.2.5 to 1.2.6 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/502

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.10.0...v1.0.0

    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Mar 2, 2022)

    This release contains an important new feature: zero-shot AutoML and mete learning. It provides a new way of doing AutoML without tuning. You can now use the existing training API from lightgbm, xgboost etc. while getting the benefit of AutoML in choosing high-performance hyperparameter configurations per task. Recommended for everyone currently using lightgbm, xgboost or random forest, regardless of previous experience in AutoML. This feature also enables continuous improvement of AutoML from historical AutoML experiments.

    Other changes can be found below.

    What's Changed

    • Typo on the webpage's Getting Started section by @cammarb in https://github.com/microsoft/FLAML/pull/457
    • Bump follow-redirects from 1.14.7 to 1.14.8 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/459
    • Docstr update by @qingyun-wu in https://github.com/microsoft/FLAML/pull/460
    • update regression metrics in notebooks by @sonichi in https://github.com/microsoft/FLAML/pull/454
    • make AutoML.classes_ an array by @sonichi in https://github.com/microsoft/FLAML/pull/467
    • Bump prismjs from 1.25.0 to 1.27.0 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/471
    • Zero-shot AutoML by @sonichi in https://github.com/microsoft/FLAML/pull/468
    • don't init global search with points_to_evaluate unless evaluated_rewards is provided; handle callbacks in fit kwargs by @sonichi in https://github.com/microsoft/FLAML/pull/469

    New Contributors

    • @cammarb made their first contribution in https://github.com/microsoft/FLAML/pull/457

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.7...v0.10.0

    Source code(tar.gz)
    Source code(zip)
  • v0.9.7(Feb 12, 2022)

    What's Changed

    • Update Task-Oriented-AutoML.md by @vvijayalakshmi21 in https://github.com/microsoft/FLAML/pull/446
    • Update Task-Oriented-AutoML.md by @vvijayalakshmi21 in https://github.com/microsoft/FLAML/pull/447
    • Update Tune-User-Defined-Function.md by @vvijayalakshmi21 in https://github.com/microsoft/FLAML/pull/448
    • corrected typo in example xgboost documentation by @MichaelMarien in https://github.com/microsoft/FLAML/pull/449
    • bump ray version to 1.10 by @sonichi in https://github.com/microsoft/FLAML/pull/450
    • fix a bug when using ray & update ray on aml by @sonichi in https://github.com/microsoft/FLAML/pull/455

    New Contributors

    • @vvijayalakshmi21 made their first contribution in https://github.com/microsoft/FLAML/pull/446

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.6...v0.9.7

    Source code(tar.gz)
    Source code(zip)
  • v0.9.6(Jan 31, 2022)

    What's Changed

    • reducing AutoConfig.from_pretrained by @liususan091219 in https://github.com/microsoft/FLAML/pull/411
    • Set use_ray to True for logging to databricks by @liususan091219 in https://github.com/microsoft/FLAML/pull/414
    • Bump nanoid from 3.1.30 to 3.2.0 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/420
    • bump version of node-fetch to 3.1.1 in website/ by @sonichi in https://github.com/microsoft/FLAML/pull/423
    • Use Ray _BackwardsCompatibleNumpyRng if possible by @Yard1 in https://github.com/microsoft/FLAML/pull/421
    • remove FLAML sample size from config by @sonichi in https://github.com/microsoft/FLAML/pull/418
    • max_iter < 2 -> no search; sign in metric constraints; test and example for forecasting by @sonichi in https://github.com/microsoft/FLAML/pull/415
    • remove redundant imports by @liususan091219 in https://github.com/microsoft/FLAML/pull/426
    • Support time series forecasting for discrete target variable by @int-chaos in https://github.com/microsoft/FLAML/pull/416
    • homepage update by @sonichi in https://github.com/microsoft/FLAML/pull/425
    • fix a broken link in README.md by @m13uz in https://github.com/microsoft/FLAML/pull/439
    • adding catch for HTTP error by @liususan091219 in https://github.com/microsoft/FLAML/pull/432
    • Change the upper bound for "lags" hyperparameter for sklearn forecast models by @int-chaos in https://github.com/microsoft/FLAML/pull/437
    • Gpu support for xgboost by @sonichi in https://github.com/microsoft/FLAML/pull/442
    • data in csv by @sonichi in https://github.com/microsoft/FLAML/pull/430
    • note about preview feature by @sonichi in https://github.com/microsoft/FLAML/pull/431

    New Contributors

    • @m13uz made their first contribution in https://github.com/microsoft/FLAML/pull/439

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.5...v0.9.6

    Source code(tar.gz)
    Source code(zip)
  • v0.9.5(Jan 17, 2022)

    What's Changed

    • fixing load best model at the end by @liususan091219 in https://github.com/microsoft/FLAML/pull/389
    • Regression forecast debug by @int-chaos in https://github.com/microsoft/FLAML/pull/391
    • set verbose for transformers by @liususan091219 in https://github.com/microsoft/FLAML/pull/392
    • Logging multiple checkpoints by @liususan091219 in https://github.com/microsoft/FLAML/pull/394
    • postcss version update by @sonichi in https://github.com/microsoft/FLAML/pull/385
    • fixing default metric for regression + change verbosity for transformers by @liususan091219 in https://github.com/microsoft/FLAML/pull/397
    • fix issues in logging, bug in space.py, constraint sign, and improve code coverage by @sonichi in https://github.com/microsoft/FLAML/pull/388
    • moving intermediate_results logging from model.py to huggingface/trainer.py by @liususan091219 in https://github.com/microsoft/FLAML/pull/403
    • Update flaml/nlp/README.md by @liususan091219 in https://github.com/microsoft/FLAML/pull/404
    • Logo by @qingyun-wu in https://github.com/microsoft/FLAML/pull/399
    • update browser icon by @qingyun-wu in https://github.com/microsoft/FLAML/pull/407
    • adding logging of training loss by @liususan091219 in https://github.com/microsoft/FLAML/pull/406
    • Bump shelljs from 0.8.4 to 0.8.5 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/402
    • Sklearn api x by @MichaelMarien in https://github.com/microsoft/FLAML/pull/405

    New Contributors

    • @MichaelMarien made their first contribution in https://github.com/microsoft/FLAML/pull/405

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.4...v0.9.5

    Source code(tar.gz)
    Source code(zip)
  • v0.9.4(Jan 8, 2022)

    This release enables regression models for time series forecasting. It also fixes bugs in nlp tasks, such as serialization of transformer models and automatic metrics.

    What's Changed

    • citation file by @sonichi in https://github.com/microsoft/FLAML/pull/364
    • Fix several issues for nlp tasks by @sonichi in https://github.com/microsoft/FLAML/pull/380
    • serialize TransformerEstimator by @sonichi in https://github.com/microsoft/FLAML/pull/381
    • Time series forecasting with sklearn regressors by @int-chaos in https://github.com/microsoft/FLAML/pull/362
    • fixing auto metric bug by @liususan091219 in https://github.com/microsoft/FLAML/pull/387

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.3...v0.9.4

    Source code(tar.gz)
    Source code(zip)
  • v0.9.3(Jan 3, 2022)

    What's Changed

    • Finish the Multiple Choice Classification by @oberonbot in https://github.com/microsoft/FLAML/pull/367
    • logging by @sonichi in https://github.com/microsoft/FLAML/pull/371
    • adding token classification by @liususan091219 and @siddheshshaji in https://github.com/microsoft/FLAML/pull/376

    New Contributors

    • @oberonbot and @siddheshshaji made their first contribution in https://github.com/microsoft/FLAML/pull/367

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.2...v0.9.3

    Source code(tar.gz)
    Source code(zip)
  • v0.9.2(Dec 26, 2021)

    New Features:

    • New task: text summarization
    • Reproducibility of hyperparameter search sequence
    • Run flaml in azureml + ray

    What's Changed

    • url update for doc edit by @sonichi in https://github.com/microsoft/FLAML/pull/345
    • Adding the NLP task summarization by @liususan091219 @XinZofStevens @GideonWu0105 in https://github.com/microsoft/FLAML/pull/346
    • reproducibility for random sampling by @sonichi in https://github.com/microsoft/FLAML/pull/349
    • doc update by @sonichi in https://github.com/microsoft/FLAML/pull/352
    • azureml + ray by @sonichi in https://github.com/microsoft/FLAML/pull/344
    • Fixing the bug in custom metric by @liususan091219 in https://github.com/microsoft/FLAML/pull/356
    • Simplify lgbm example by @ruizhuanguw in https://github.com/microsoft/FLAML/pull/358
    • fixing custom metric by @liususan091219 in https://github.com/microsoft/FLAML/pull/357
    • Example by @sonichi in https://github.com/microsoft/FLAML/pull/359

    New Contributors

    • @ruizhuanguw @XinZofStevens @GideonWu0105 made their first contribution in https://github.com/microsoft/FLAML/pull/358

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.1...v0.9.2

    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Dec 17, 2021)

    This release contains several feature improvements and bug fixes. For example,

    • support for custom data splitter.
    • evaluation_function can receive incumbent result in local search and perform domain-specific early stopping by comparing with the incumbent result. As long as the comparison result (better or worse) is known, the evaluation can be stopped.
    • support and automate huggingface metrics.
    • use cfo in tune.run if bs is not installed.
    • fixed a bug in modifying n_estimators to satisfy constraints.
    • new documentation website.

    What's Changed

    • Update flaml_pytorch_cifar10.ipynb by @sonichi in https://github.com/microsoft/FLAML/pull/328
    • adding HF metrics by @liususan091219 in https://github.com/microsoft/FLAML/pull/335
    • train at least one iter when not trained by @sonichi in https://github.com/microsoft/FLAML/pull/336
    • use cfo in tune.run if bs is not installed by @sonichi in https://github.com/microsoft/FLAML/pull/334
    • Makes the evaluation_function could receive the incumbent best result as input in Tune by @Shao-kun-Zhang in https://github.com/microsoft/FLAML/pull/339
    • support for customized splitters by @wuchihsu in https://github.com/microsoft/FLAML/pull/333
    • Deploy a new doc website by @sonichi, @qingyun-wu and @Shao-kun-Zhang in https://github.com/microsoft/FLAML/pull/338
    • version update by @sonichi in https://github.com/microsoft/FLAML/pull/341

    New Contributors

    • @Shao-kun-Zhang made their first contribution in https://github.com/microsoft/FLAML/pull/339

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.0...v0.9.1

    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Dec 7, 2021)

    1. Revise flaml.tune API
    • Add a “scheduler” argument (a user can choose from “flaml”, “asha” or a customized scheduler)
    • Rename "prune_attr" to "resource_attr"
    • Rename “training_function” to “evaluation_function”
    • Remove the “report_intermediate_result” argument (covered by “scheduler” instead)
    • Add tests for the supported schedulers
    • Re-run notebooks that use schedulers
    1. Add save_best_config() to save best config in a json file

    What's Changed

    • add save_best_config() by @sonichi in https://github.com/microsoft/FLAML/pull/324
    • tune api for schedulers by @qingyun-wu in https://github.com/microsoft/FLAML/pull/322
    • add init.py in nlp by @sonichi in https://github.com/microsoft/FLAML/pull/325
    • rename training_function by @qingyun-wu in https://github.com/microsoft/FLAML/pull/327

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.8.2...v0.9.0

    Source code(tar.gz)
    Source code(zip)
  • v0.8.2(Dec 4, 2021)

    What's Changed

    • include default value in rf search space by @sonichi in https://github.com/microsoft/FLAML/pull/317
    • adding TODOs for NLP module, so students can implement other tasks easier by @liususan091219 in https://github.com/microsoft/FLAML/pull/321
    • pred_time_limit clarification and logging by @sonichi in https://github.com/microsoft/FLAML/pull/319
    • bug fix in confg2params by @sonichi in https://github.com/microsoft/FLAML/pull/323

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.8.1...v0.8.2

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Nov 28, 2021)

    What's Changed

    • Update test_regression.py by @fengsxy in https://github.com/microsoft/FLAML/pull/306
    • Add conda forge minimal test by @MichalChromcak in https://github.com/microsoft/FLAML/pull/309
    • fixing config2params for transformersestimator by @liususan091219 in https://github.com/microsoft/FLAML/pull/316
    • Code quality improvement based on #275 by @abnsy and @sonichi in https://github.com/microsoft/FLAML/pull/313
    • skip cv preparation if eval_method is holdout by @sonichi in https://github.com/microsoft/FLAML/pull/314

    New Contributors

    • @fengsxy made their first contribution in https://github.com/microsoft/FLAML/pull/306
    • @abnsy made their first contribution in https://github.com/microsoft/FLAML/pull/313

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.8.0...v0.8.1

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Nov 23, 2021)

    In this release, we add two nlp tasks: sequence classification and sequence regression to flaml.AutoML, using transformer-based neural networks. Previously the nlp module was detached from flaml.AutoML with a separate API. We redesigned the API such that the nlp tasks can be accessed from the same API as other tasks, and adding more nlp tasks in future would be easy. Thanks for the hard work @liususan091219 !

    We've also continued to make more performance & feature improvements. Examples:

    • We added a variation of XGBoost search space which uses limited max_depth. It includes the default configuration from XGBoost library. The new search space leads to significantly better performance for some regression datasets.
    • We allow arguments for flaml.AutoML to be passed to the constructor. This enables multioutput regression by combining sklearn's MultioutputRegressor and flaml's AutoML.
    • We made more memory optimization, while allowing users to keep the best model per estimator in memory through the "model_history" option.

    What's Changed

    • Unify regression and classification for XGBoost by @sonichi in https://github.com/microsoft/FLAML/pull/276
    • when max_iter=1, skip search only if retrain_final by @sonichi in https://github.com/microsoft/FLAML/pull/280
    • example update by @sonichi in https://github.com/microsoft/FLAML/pull/281
    • Merge exp into flaml by @liususan091219 in https://github.com/microsoft/FLAML/pull/210
    • add best_loss_per_estimator by @qingyun-wu in https://github.com/microsoft/FLAML/pull/286
    • model_history -> save_best_model_per_estimator by @sonichi in https://github.com/microsoft/FLAML/pull/283
    • datetime feature engineering by @sonichi in https://github.com/microsoft/FLAML/pull/285
    • add warmstart test by @qingyun-wu in https://github.com/microsoft/FLAML/pull/298
    • empty search space by @sonichi in https://github.com/microsoft/FLAML/pull/295
    • multioutput regression by @sonichi in https://github.com/microsoft/FLAML/pull/292
    • add max_depth to xgboost search space by @sonichi in https://github.com/microsoft/FLAML/pull/282
    • custom metric function clarification by @sonichi in https://github.com/microsoft/FLAML/pull/300
    • checkpoint naming in nonray mode, fix ray mode, delete checkpoints in nonray mode by @liususan091219 in https://github.com/microsoft/FLAML/pull/293

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.7.1...v0.8.0

    Source code(tar.gz)
    Source code(zip)
  • v0.7.1(Nov 8, 2021)

    What's Changed

    • make default verbose level > 0 when using ray by @sonichi in https://github.com/microsoft/FLAML/pull/272
    • default to cfo for single estimator by @sonichi in https://github.com/microsoft/FLAML/pull/273
    • update docstr by @sonichi and @qingyun-wu in https://github.com/microsoft/FLAML/pull/274
    • fixed a bug in #278 by @sonichi in https://github.com/microsoft/FLAML/pull/274

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.7.0...v0.7.1

    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Nov 4, 2021)

    New feature: multivariate time series forecasting.

    What's Changed

    • Fix exception in CFO's _create_condition if all candidate start points didn't return yet by @Yard1 in https://github.com/microsoft/FLAML/pull/263
    • Integrate multivariate time series forecasting by @int-chaos in https://github.com/microsoft/FLAML/pull/254
    • Update Dockerfile by @wuchihsu in https://github.com/microsoft/FLAML/pull/269
    • limit time and memory consumption by @sonichi in https://github.com/microsoft/FLAML/pull/264

    New Contributors

    • @wuchihsu made their first contribution in https://github.com/microsoft/FLAML/pull/269

    Full Changelog: https://github.com/microsoft/FLAML/compare/v0.6.9...v0.7.0

    Source code(tar.gz)
    Source code(zip)
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Python package for causal inference using Bayesian structural time-series models.

Python Causal Impact Causal inference using Bayesian structural time-series models. This package aims at defining a python equivalent of the R CausalI

Thomas Cassou 219 Dec 11, 2022
Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas. Its objective is to ex

Taylor G Smith 54 Aug 20, 2022
A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching. The solver will solve equations of the type: A can be

Sanjeet N. Dasharath 3 Feb 15, 2022
MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

SUPSI-DACD-ISAAC 61 Dec 19, 2022
AutoOED: Automated Optimal Experiment Design Platform

AutoOED is an optimal experiment design platform powered with automated machine learning to accelerate the discovery of optimal solutions. Our platform solves multi-objective optimization problems an

Yunsheng Tian 107 Jan 03, 2023
Fundamentals of Machine Learning

Fundamentals-of-Machine-Learning This repository introduces the basics of machine learning algorithms for preprocessing, regression and classification

Happy N. Monday 3 Feb 15, 2022
PennyLane is a cross-platform Python library for differentiable programming of quantum computers

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural ne

PennyLaneAI 1.6k Jan 01, 2023
Xeasy-ml is a packaged machine learning framework.

xeasy-ml 1. What is xeasy-ml Xeasy-ml is a packaged machine learning framework. It allows a beginner to quickly build a machine learning model and use

9 Mar 14, 2022
A single Python file with some tools for visualizing machine learning in the terminal.

Machine Learning Visualization Tools A single Python file with some tools for visualizing machine learning in the terminal. This demo is composed of t

Bram Wasti 35 Dec 29, 2022
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

mlpack 4.2k Jan 01, 2023
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022
An open-source library of algorithms to analyse time series in GPU and CPU.

An open-source library of algorithms to analyse time series in GPU and CPU.

Shapelets 216 Dec 30, 2022
A Time Series Library for Apache Spark

Flint: A Time Series Library for Apache Spark The ability to analyze time series data at scale is critical for the success of finance and IoT applicat

Two Sigma 970 Jan 04, 2023
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
Python ML pipeline that showcases mltrace functionality.

mltrace tutorial Date: October 2021 This tutorial builds a training and testing pipeline for a toy ML prediction problem: to predict whether a passeng

Log Labs 28 Nov 09, 2022
Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validat

The Apache Software Foundation 121 Dec 28, 2022
My capstone project for Udacity's Machine Learning Nanodegree

MLND-Capstone My capstone project for Udacity's Machine Learning Nanodegree Lane Detection with Deep Learning In this project, I use a deep learning-b

Michael Virgo 407 Dec 12, 2022
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Daniel Formoso 5.7k Dec 30, 2022
Turning images into '9-pan' palettes using KMeans clustering from sklearn.

img2palette Turning images into '9-pan' palettes using KMeans clustering from sklearn. Requirements We require: Pillow, for opening and processing ima

Samuel Vidovich 2 Jan 01, 2022
Napari sklearn decomposition

napari-sklearn-decomposition A simple plugin to use with napari This napari plug

1 Sep 01, 2022