Fast solver for L1-type problems: Lasso, sparse Logisitic regression, Group Lasso, weighted Lasso, Multitask Lasso, etc.

Overview

celer

image0 image1

Fast algorithm to solve Lasso-like problems with dual extrapolation. Currently, the package handles the following problems:

  • Lasso
  • weighted Lasso
  • Sparse Logistic regression
  • Group Lasso
  • Multitask Lasso.

The estimators follow the scikit-learn API, come with automated parallel cross-validation, and support both sparse and dense data, with optionally feature centering, normalization, and unpenalized intercept fitting. The solvers used allow for solving large scale problems with millions of features, up to 100 times faster than scikit-learn.

Documentation

Please visit https://mathurinm.github.io/celer/ for the latest version of the documentation.

Install the released version

Assuming you have a working Python environment, e.g., with Anaconda you can install celer with pip.

From a console or terminal install celer with pip:

pip install -U celer

Install and work with the development version

From a console or terminal clone the repository and install Celer:

git clone https://github.com/mathurinm/celer.git
cd celer/
pip install -e .

To build the documentation you will need to run:

pip install -U sphinx_gallery sphinx_bootstrap_theme
cd doc/
make html

Demos & Examples

In the example section of the documentation, you will find numerous examples on real life datasets, timing comparison with other estimators, easy and fast ways to perform cross validation, etc.

Dependencies

All dependencies are in the ./requirements.txt file. They are installed automatically when pip install -e . is run.

Cite

If you use this code, please cite:

@InProceedings{pmlr-v80-massias18a,
  title =    {Celer: a Fast Solver for the Lasso with Dual Extrapolation},
  author =   {Massias, Mathurin and Gramfort, Alexandre and Salmon, Joseph},
  booktitle =        {Proceedings of the 35th International Conference on Machine Learning},
  pages =    {3321--3330},
  year =     {2018},
  volume =   {80},
}


@article{massias2020dual,
  author  = {Mathurin Massias and Samuel Vaiter and Alexandre Gramfort and Joseph Salmon},
  title   = {Dual Extrapolation for Sparse GLMs},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {234},
  pages   = {1-33},
  url     = {http://jmlr.org/papers/v21/19-587.html}
}

ArXiv links:

Comments
  • n_jobs (multi-core CPU) for LassoCV function?

    n_jobs (multi-core CPU) for LassoCV function?

    Thanks for these tools! Do any of the celer sklearn functions (LassoCV, etc.) have multi-cpu support like the original sklearn functions (n_jobs = 1, 2, 3, etc.)? Currently, the n_jobs argument is unrecognized as: __init__() got an unexpected keyword argument 'n_jobs'

    opened by yjkimnada 14
  • why the three number in 'yhat' is same

    why the three number in 'yhat' is same

    Hello! I don't know why the three number in 'yhat' is same and 'w_hat' is all zero. I'd like to get 'w_hat' with a few non-zero numbers so that I can know which group of 'X' is used. Can you help me with that? Thank you very much!

    ###############################################################################
    # Setup
    # -----
    
    import matplotlib.pyplot as plt
    import numpy as np
    from numpy import linalg
    from skimage import io
    from skimage.color.colorconv import _prepare_colorarray
    from sklearn.metrics import r2_score
    from sklearn.metrics import mean_squared_error
    
    from group_lasso import GroupLasso
    
    np.random.seed(0)     
    GroupLasso.LOG_LOSSES = True
    
    ###############################################################################
    # Set dataset parameters
    # ----------------------
    group_sizes = [3, 3, 2, 2, 2]
    active_groups = np.ones(5)
    active_groups[:3] = 2
    np.random.shuffle(active_groups)
    np.random.shuffle(active_groups)
    groups = np.concatenate(
        [size * [i] for i, size in enumerate(group_sizes)]
    ).reshape(-1, 1)
    num_coeffs = sum(group_sizes)
    num_datapoints = 3
    
    print("group_sizes:", group_sizes)
    print("active_groups:", active_groups)
    print("groups:", groups.shape)
    print("num_coeffs:", num_coeffs)
    print("____________________________________________")
    
    ###############################################################################
    # Generate data matrix
    # --------------------
    X = np.array([[0.571, 0.095, 0.767, 0.095, 0.105, 0.767, 0.571, 0.767, 0.095, 0.767, 0.105, 0.767],
                  [0.584, 0.258, 0.576, 0.258, 0.758, 0.576, 0.584, 0.576, 0.258, 0.576, 0.758, 0.576],
                  [0.577, 0.961, 0.284, 0.961, 0.644, 0.284, 0.577, 0.284, 0.961, 0.284, 0.644, 0.284]])
    
    print("X:", X.shape)
    print("____________________________________________")
    
    ###############################################################################
    # Generate coefficients
    # ---------------------k
    w = np.concatenate(
        [
            np.random.standard_normal(group_size) * is_active
            for group_size, is_active in zip(group_sizes, active_groups)
        ]
    )
    w = w.reshape(-1, 1)
    true_coefficient_mask = w != 0
    intercept = 2
    
    print("w:", w.shape)
    print("true_coefficient_mask:", true_coefficient_mask.sum())
    print("____________________________________________")
    
    ###############################################################################
    # Generate regression targets
    # ---------------------------
    
    y_true = X @ w
    y = np.array([[-0.17997138],
                  [-0.15219182],
                  [-0.17062552]])
    y_true = X @ w
    print("y:", y)
    MSE1 = mean_squared_error(y, y_true)
    print("MSE_yt_y:", MSE1)
    print("____________________________________________")
    
    ###############################################################################
    # Generate estimator and train it
    # -------------------------------
    gl = GroupLasso(
        groups=groups,
        group_reg=5,
        l1_reg=2,
        frobenius_lipschitz=True,
        scale_reg="inverse_group_size",
        subsampling_scheme=1,
        supress_warning=True,
        n_iter=1000,
        tol=1e-3,
    )
    gl.fit(X, y)
    
    ###############################################################################
    # Extract results and compute performance metrics
    # -----------------------------------------------
    
    # Extract info from estimator
    yhat = gl.predict(X)
    sparsity_mask = gl.sparsity_mask_
    w_hat = gl.coef_
    
    print("yhat:", yhat)
    print("w_hat:", w_hat.sum())
    print("sparsity_mask:", sparsity_mask)
    print("____________________________________________")
    
    # Compute performance metrics
    R2 = r2_score(y, yhat)
    MSE_y_yh = mean_squared_error(y, yhat)
    print("MSE_y_yh:", MSE_y_yh)
    print("____________________________________________")
    

    And the result of the program after running is as follows.

    group_sizes: [3, 3, 2, 2, 2]
    active_groups: [2. 2. 2. 1. 1.]
    groups: (12, 1)
    num_coeffs: 12
    ____________________________________________
    X: (3, 12)
    ____________________________________________
    w: (12, 1)
    true_coefficient_mask: 12
    ____________________________________________
    y: [[-0.17997138]
     [-0.15219182]
     [-0.17062552]]
    MSE_yt_y: 55.67436677644974
    ____________________________________________
    yhat: [-0.16644355 -0.16644355 -0.16644355]
    w_hat: 0.0
    sparsity_mask: [False False False False False False False False False False False False]
    ____________________________________________
    MSE_y_yh: 0.0001345342966391801
    ____________________________________________
    
    opened by SikangSHU 13
  • add MultiTaskLassoCV and MultiTaskLasso

    add MultiTaskLassoCV and MultiTaskLasso

    @mathurinm I tried to add MultiTaskLassoCV and MultiTaskLasso to help @ja-che with his experiments but the test does not end. It seems like it's not converging.

    @mathurinm can you have a look?

    also you'll see that mtl_path is not consistent in API with sklearn lasso_path

    wdyt?

    opened by agramfort 9
  • ENH: reflexion about tol

    ENH: reflexion about tol

    1. behaviour disagrees with sklearn because the latter scales tol by norm(y) ** 2 (or norm(y) ** 2 / n_samples ?)

    2. using tol < 1e-7 with float32 caused precision issues (found out in check_estimator, MCVE:

                      [1.9376824, 1.3127615, 2.675319, 2.8909883, 1.1503246],
                      [2.375175, 1.5866847, 1.7041336, 2.77679, 0.21310817],
                      [0.2613879, 0.06065519, 2.4978595, 2.3344703, 2.6100364],
                      [2.935855, 2.3974757, 1.384438, 2.3415875, 0.3548233],
                      [1.9197631, 0.43005985, 2.8340068, 1.565545, 1.2439859],
                      [0.79366684, 2.322701, 1.368451, 1.7053018, 0.0563694],
                      [1.8529065, 1.8362871, 1.850802, 2.8312442, 2.0454607],
                      [1.0785236, 1.3110958, 2.0928936, 0.18067642, 2.0003002],
                      [2.0119135, 0.6311477, 0.3867789, 0.946285, 1.0911323]],
                     dtype=np.float32)
    
        y = np.array([[1.],
                      [1.],
                      [2.],
                      [0.],
                      [2.],
                      [1.],
                      [0.],
                      [1.],
                      [1.],
                      [2.]], dtype=np.float32)
    
        params = dict(eps=1e-2, n_alphas=10, tol=1e-10, cv=2, n_jobs=1,
                      fit_intercept=False, verbose=2)
    
        clf = MultiTaskLassoCV(**params)
        clf.fit(X, y)
    

    (casting X to float64 fixes it)

    so maybe we can raise a warning if tol is low and X.dtype == np.float32

    opened by mathurinm 7
  • Segmentation fault when fitting `GroupLasso`

    Segmentation fault when fitting `GroupLasso`

    I just updated to Celer 0.7dev to benefit from the weights argument in GroupLasso. After updating, I now ran into a segmentation fault error when fitting GroupLasso.

    Here is a minimal reproducible example:

    import numpy as np
    from celer import GroupLasso
    
    X = np.random.normal(0, 1, (30, 60))
    y = np.random.normal(0, 1, (30,))
    
    clf = GroupLasso(3, 0.001, tol=1e-10, fit_intercept=False)
    clf.fit(X, y)
    

    @Badr-MOUFAD Are you able to reproduce this error on your machine?

    Note that Lasso is working perfectly, and that the error occurs only with GroupLasso with or without the weights argument.

    opened by PABannier 6
  • Warm starts and CPUs in use for the GroupLasso

    Warm starts and CPUs in use for the GroupLasso

    Hi! I was wondering how you were leveraging warm starts in GroupLassoCV? Perhaps between folds in for a given alpha? By default, it seems many cores are being used by GroupLasso or GroupLassoCV? In the case of GroupLassoCV, performance seems to decrease when njobs is set to -1.

    Thanks for your work in any case !

    opened by georgestod 6
  • [READY] FIX - more than one iteration done when fitting with alpha > alpha_max

    [READY] FIX - more than one iteration done when fitting with alpha > alpha_max

    closes #252

    We get such behavior because of how we construct a dual feasible vector.

    Indeed, when alpha > alpha_max (let say alpha = m * alpha_max where m > 1) , we start the solver at theta = m * y / n_samples, wich is feasible yet not optimal.

    ...

    This proceeds as follows:

    • [x] fix bug
    • [x] add unit testing
    opened by Badr-MOUFAD 5
  • ENH add weights to GroupLasso to provide support for an Adaptive GroupLasso

    ENH add weights to GroupLasso to provide support for an Adaptive GroupLasso

    Hi, thanks for this great package! Does the group-lasso method allow for different penalizations for different groups, such that one could fit an adaptive group lasso, i.e., applying different weights in the second call to the method? (reference for adapative group lasso: https://www.sciencedirect.com/science/article/abs/pii/S0167947308002582)

    If not, would it be possible to extend the method (easily)?

    Thank you.

    opened by sehoff 5
  • Different output than scikit-learn's LASSO on a weird example

    Different output than scikit-learn's LASSO on a weird example

    Hi !

    I am not sure this is the best place to report this, but I noticed a difference in the ouptut produced by your solver and scikit-learn's. I had a really hard time coming up with some minimal example, sorry if the one below is not really informative. I am using Python 3.7.4 and celer development version on Ubuntu.

    import numpy as np
    import sklearn.linear_model
    import celer
    
    
    X = np.array([[0, 0, 0, 0, 0, 0, 0, 0.001, 0, 0, 0.015, 0, 0, 0.046, 0, 0, 0.061, 0, 0, 0.062]]).T
    y = np.array([0.008, 0, 0.001, 0.02, 0, 0.001, 0.024, 0.001, 0.001, 0.023,
                  0.006, 0, 0.011, 0.032, 0, 0.002, 0.056, 0.001, 0.001, 0.062])
    
    lasso_sklearn = sklearn.linear_model.Lasso(alpha=1e-4)
    lasso_celer = celer.Lasso(alpha=1e-4)
    
    lasso_sklearn.fit(X, y)
    lasso_celer.fit(X, y)
    

    So the coefficient I get from scikit-learn's solver is approximately 0.55 (dual gap is 0) and the one I get from your solver is 0 (dual gap is approximately 1e-4).

    I know this is a really degenerate use case, so maybe there is no need to worry about it, but I wanted to report this just in case, and ask if there was any reason celer should not be used in such situation.

    Thanks in advance for your help !

    opened by rpetit 5
  • BUG - unable to install ``celer`` in an empty python virtual environment

    BUG - unable to install ``celer`` in an empty python virtual environment

    minimal steps to reproduce,

    1. cd to a new directory
    2. create a python environment python -m venv venv
    3. install celer pip install -U celer

    Error logs

    (click to expend)
      import numpy.distutils.command.sdist
      Traceback (most recent call last):
        File "C:\Users\HP\AppData\Local\Temp\easy_install-unrz350b\numpy-1.23.0rc3\setup.py", line 251, in generate_cython
      ModuleNotFoundError: No module named 'Cython'
     
      The above exception was the direct cause of the following exception:
     
      Traceback (most recent call last):
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 154, in save_modules
          yield saved
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 195, in setup_context
          yield
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 250, in run_setup
          _execfile(setup_script, ns)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 45, in _execfile
          exec(code, globals, locals)
        File "C:\Users\HP\AppData\Local\Temp\easy_install-unrz350b\numpy-1.23.0rc3\setup.py", line 493, in <module>
        File "C:\Users\HP\AppData\Local\Temp\easy_install-unrz350b\numpy-1.23.0rc3\setup.py", line 475, in setup_package
        File "C:\Users\HP\AppData\Local\Temp\easy_install-unrz350b\numpy-1.23.0rc3\setup.py", line 258, in generate_cython
      OSError: Cython needs to be installed in Python as a module
     
      During handling of the above exception, another exception occurred:
     
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\HP\AppData\Local\Temp\pip-req-build-87kkcnrt\setup.py", line 4, in <module>
          dist.Distribution().fetch_build_eggs(['numpy>=1.12'])
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\dist.py", line 716, in fetch_build_eggs
          resolved_dists = pkg_resources.working_set.resolve(
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\pkg_resources\__init__.py", line 780, in resolve
          dist = best[req.key] = env.best_match(
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\pkg_resources\__init__.py", line 1065, in best_match
          return self.obtain(req, installer)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\pkg_resources\__init__.py", line 1077, in obtain
          return installer(requirement)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\dist.py", line 786, in fetch_build_egg
          return cmd.easy_install(req)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\command\easy_install.py", line 679, in easy_install      
          return self.install_item(spec, dist.location, tmpdir, deps)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\command\easy_install.py", line 705, in install_item      
          dists = self.install_eggs(spec, download, tmpdir)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\command\easy_install.py", line 890, in install_eggs      
          return self.build_and_install(setup_script, setup_base)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\command\easy_install.py", line 1158, in 
       build_and_install          self.run_setup(setup_script, setup_base, args)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\command\easy_install.py", line 1144, in run_setup        
          run_setup(setup_script, args)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 253, in run_setup
          raise
        File "C:\Users\HP\AppData\Local\Programs\Python\Python38-32\lib\contextlib.py", line 131, in __exit__
          self.gen.throw(type, value, traceback)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 195, in setup_context
          yield
        File "C:\Users\HP\AppData\Local\Programs\Python\Python38-32\lib\contextlib.py", line 131, in __exit__
          self.gen.throw(type, value, traceback)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 166, in save_modules
          saved_exc.resume()
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 141, in resume
          six.reraise(type, exc, self._tb)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\_vendor\six.py", line 685, in reraise
          raise value.with_traceback(tb)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 154, in save_modules
          yield saved
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 195, in setup_context
          yield
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 250, in run_setup
          _execfile(setup_script, ns)
        File "C:\Users\HP\Desktop\install-celer\venv\lib\site-packages\setuptools\sandbox.py", line 45, in _execfile
          exec(code, globals, locals)
        File "C:\Users\HP\AppData\Local\Temp\easy_install-unrz350b\numpy-1.23.0rc3\setup.py", line 493, in <module>
        File "C:\Users\HP\AppData\Local\Temp\easy_install-unrz350b\numpy-1.23.0rc3\setup.py", line 475, in setup_package
        File "C:\Users\HP\AppData\Local\Temp\easy_install-unrz350b\numpy-1.23.0rc3\setup.py", line 258, in generate_cython
      OSError: Cython needs to be installed in Python as a module
      [end of output]
    
     note: This error originates from a subprocess, and is likely not a problem with pip.
     error: metadata-generation-failed
    
      × Encountered error while generating package metadata.
      ╰─> See above for output.
    

    Additional comments

    It seems that the error is related to fetching NumPy builds

    https://github.com/mathurinm/celer/blob/bd11a44471ff5b688eea30bb43030ec547dd3a4d/setup.py#L4

    Also, a portion of the error, namely

    OSError: Cython needs to be installed in Python as a module
          [end of output]
    

    is related to https://github.com/numpy/numpy/blob/705244444526804860e9f8d52459e2ca5c255366/setup.py#L257-L258

    opened by Badr-MOUFAD 4
  • FIX - Compatibility with ``scikit-learn`` 1.2.dev

    FIX - Compatibility with ``scikit-learn`` 1.2.dev

    scikit-learn 1.2.dev breaks down the actual code. debug_script.py provides a small snippet to reproduce.

    investigation

    scikit-learn seems to have added a validation step of the class parameters at the fit moment. our Lasso estimator doesn't have the same signature as the scikit-learn (e.g. copy_X and random_state), though we inherit from it.

    Therefore we get an error when comparing the constructor arguments with the parent class

    click to expend error
    raise ValueError(
    ValueError: The parameter constraints ['alpha', 'fit_intercept', 'precompute', 'max_iter', 'copy_X', 'tol', 'warm_start', 'positive', 
    'random_state', 'selection'] contain unexpected parameters {'copy_X', 'precompute', 'random_state', 'selection'}
    

    potential fix

    A straightforward fix would be to override _validate_params. But I don't think it's a reliable way to do it.

    click to expend code
    def _validate_params(self):
            pass
    
    opened by Badr-MOUFAD 0
  • `climate._target_region` incorrectly extracts misaligned column

    `climate._target_region` incorrectly extracts misaligned column

    There seems to be a problem with the climate.target_region function which incorrectly extracts one column to the right of the intended one. https://github.com/mathurinm/celer/blob/main/celer/datasets/climate.py#L60-L72

    The pos_Lx value range is supposed to be 0~143, but it is 0~144. For example, if Lx is 359, pos_Lx will be 144.

    Shouldn't we use np.floor insted of np.ceil?

    opened by shimon-sato 2
  • MAINT prune default value different between celer_path and Lasso

    MAINT prune default value different between celer_path and Lasso

    Thank you again for this remarkably fast library.

    In the celer_path the default value of prune is 0 while in Lasso, MultiTaskLasso, and GroupLasso it's set to 1 (True).

    opened by cruyffturn 1
  • Feature Request: MultiTask GroupLasso

    Feature Request: MultiTask GroupLasso

    I was wondering whether you could (easily) implement the MultiTask GroupLasso as in, or similar to, equation 1 here: http://psb.stanford.edu/psb-online/proceedings/psb22/nouira.pdf

    Thanks !

    opened by sehoff 0
  • MAINT - Use ``create_dual_point`` in group and multitask lasso

    MAINT - Use ``create_dual_point`` in group and multitask lasso

    https://github.com/mathurinm/celer/blob/4230db117a916e5158cfae85b2cd1a7249cf5475/celer/group_fast.pyx#L237-L238

    https://github.com/mathurinm/celer/blob/4230db117a916e5158cfae85b2cd1a7249cf5475/celer/multitask_fast.pyx#L332-L333

    opened by Badr-MOUFAD 0
  • DOC - Add examples for ``ElasticNet``

    DOC - Add examples for ``ElasticNet``

    Example(s) should exhibit the advantages of the ElasticNet estimator over the Lasso and OLS estimators, namely

    • [x] Feature selection (case p >> n)
    • [ ] Grouping effects
    • [ ] Generalization to unseen data.

    (refer toH Zou · 2005)

    opened by Badr-MOUFAD 0
Releases(v0.5.1)
Extra blocks for scikit-learn pipelines.

scikit-lego We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to atte

vincent d warmerdam 941 Dec 30, 2022
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

6.2k Jan 01, 2023
Large-scale linear classification, regression and ranking in Python

lightning lightning is a library for large-scale linear classification, regression and ranking in Python. Highlights: follows the scikit-learn API con

1.6k Dec 31, 2022
Fast solver for L1-type problems: Lasso, sparse Logisitic regression, Group Lasso, weighted Lasso, Multitask Lasso, etc.

celer Fast algorithm to solve Lasso-like problems with dual extrapolation. Currently, the package handles the following problems: Lasso weighted Lasso

168 Dec 13, 2022
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 01, 2023
Data Analysis Baseline Library

dabl The data analysis baseline library. "Mr Sanchez, are you a data scientist?" "I dabl, Mr president." Find more information on the website. State o

Andreas Mueller 122 Dec 27, 2022
scikit-learn cross validators for iterative stratification of multilabel data

iterative-stratification iterative-stratification is a project that provides scikit-learn compatible cross validators with stratification for multilab

745 Jan 05, 2023
Topological Data Analysis for Python🐍

Scikit-TDA is a home for Topological Data Analysis Python libraries intended for non-topologists. This project aims to provide a curated library of TD

Scikit-TDA 373 Dec 24, 2022
machine learning with logical rules in Python

skope-rules Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license. Skope-rules a

504 Dec 31, 2022
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

213 Jan 02, 2023
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 28, 2022
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

803 Jan 05, 2023
(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

combo: A Python Toolbox for Machine Learning Model Combination Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License

Yue Zhao 606 Dec 21, 2022
Multivariate imputation and matrix completion algorithms implemented in Python

A variety of matrix completion and imputation algorithms implemented in Python 3.6. To install: pip install fancyimpute Do not use conda. We don't sup

Alex Rubinsteyn 1.1k Dec 18, 2022
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

418 Jan 09, 2023
A Python library for dynamic classifier and ensemble selection

DESlib DESlib is an easy-to-use ensemble learning library focused on the implementation of the state-of-the-art techniques for dynamic classifier and

425 Dec 18, 2022
A library of sklearn compatible categorical variable encoders

Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques

2.1k Jan 02, 2023