Machine Learning toolbox for Humans

Related tags

Machine Learningrep
Overview

Reproducible Experiment Platform (REP)

Join the chat at https://gitter.im/yandex/rep Build Status PyPI version Documentation CircleCI

REP is ipython-based environment for conducting data-driven research in a consistent and reproducible way.

Main features:

  • unified python wrapper for different ML libraries (wrappers follow extended scikit-learn interface)
    • Sklearn
    • TMVA
    • XGBoost
    • uBoost
    • Theanets
    • Pybrain
    • Neurolab
    • MatrixNet service(available to CERN)
  • parallel training of classifiers on cluster
  • classification/regression reports with plots
  • interactive plots supported
  • smart grid-search algorithms with parallel execution
  • research versioning using git
  • pluggable quality metrics for classification
  • meta-algorithm design (aka 'rep-lego')

REP is not trying to substitute scikit-learn, but extends it and provides better user experience.

Howto examples

To get started, look at the notebooks in /howto/

Notebooks can be viewed (not executed) online at nbviewer
There are basic introductory notebooks (about python, IPython) and more advanced ones (about the REP itself)

Examples code is written in python 2, but library is python 2 and python 3 compatible.

Installation with Docker

We provide the docker image with REP and all it's dependencies. It is a recommended way, specially if you're not experienced in python.

Installation with bare hands

However, if you want to install REP and all of its dependencies on your machine yourself, follow this manual: installing manually and running manually.

Links

License

Apache 2.0, library is open-source.

Minimal examples

REP wrappers are sklearn compatible:

from rep.estimators import XGBoostClassifier, SklearnClassifier, TheanetsClassifier
clf = XGBoostClassifier(n_estimators=300, eta=0.1).fit(trainX, trainY)
probabilities = clf.predict_proba(testX)

Beloved trick of kagglers is to run bagging over complex algorithms. This is how it is done in REP:

from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(base_estimator=XGBoostClassifier(), n_estimators=10)
# wrapping sklearn to REP wrapper
clf = SklearnClassifier(clf)

Another useful trick is to use folding instead of splitting data into train/test. This is specially useful when you're using some kind of complex stacking

from rep.metaml import FoldingClassifier
clf = FoldingClassifier(TheanetsClassifier(), n_folds=3)
probabilities = clf.fit(X, y).predict_proba(X)

In example above all data are splitted into 3 folds, and each fold is predicted by classifier which was trained on other 2 folds.

Also REP classifiers provide report:

report = clf.test_on(testX, testY)
report.roc().plot() # plot ROC curve
from rep.report.metrics import RocAuc
# learning curves are useful when training GBDT!
report.learning_curve(RocAuc(), steps=10)  

You can read about other REP tools (like smart distributed grid search, folding and factory) in documentation and howto examples.

Comments
  • Problem with TMVAClassifier

    Problem with TMVAClassifier

    After REP installation from here, I've met the following problem with TMVAClassifier fitting: I'm trying to train TMVAClassifier, and IOError raises after following strings: " baseline = TMVAClassifier(method='kBDT', features=variables, BoostType='Grad', NTrees=40, Shrinkage=0.01, MaxDepth=7, UseNvars=6, nCuts=-1) features=variables)

    baseline.fit(train, train['signal'])"

    Stacktrace is next: IOError Traceback (most recent call last) in () 3 UseNvars=6, nCuts=-1) 4 # baseline = TMVAClassifier(method='kBDT', NTrees=50, Shrinkage=0.05, features=variables) ----> 5 baseline.fit(train, train['signal'])

    /usr/local/lib/python2.7/dist-packages/rep-0.6.3-py2.7.egg/rep/estimators/tmva.pyc in fit(self, X, y, sample_weight) 288 self.factory_options = '{}:AnalysisType=Multiclass'.format(self.factory_options) 289 --> 290 return self._fit(X, y, sample_weight=sample_weight) 291 292 def predict_proba(self, X):

    /usr/local/lib/python2.7/dist-packages/rep-0.6.3-py2.7.egg/rep/estimators/tmva.pyc in _fit(self, X, y, sample_weight, model_type) 104 add_info = _AdditionalInformation(directory, model_type=model_type) 105 try: --> 106 self._run_tmva_training(add_info, X, y, sample_weight) 107 finally: 108 self._remove_tmp_directory(directory)

    /usr/local/lib/python2.7/dist-packages/rep-0.6.3-py2.7.egg/rep/estimators/tmva.pyc in run_tmva_training(self, info, X, y, sample_weight) 134 xml_filename = os.path.join(info.directory, 'weights', 135 '{job}{name}.weights.xml'.format(job=info.tmva_job, name=self._method_name)) --> 136 with open(xml_filename, 'r') as xml_file: 137 self.formula_xml = xml_file.read() 138

    IOError: [Errno 2] No such file or directory: '/home/artem/Documents/IPython Notebooks/CERN + Yandex/Original Baseline/flavours-of-physics-start/tmp0Fhtqe/weights/TMVAEstimation_REP_Estimator.weights.xml'

    As I found, weights/ folder was created outside of temporary folder instead created inside in last one. It causes the error above.

    ROOT 5.34, Python 2.7, GCC 4.8, Ubuntu 14.04 LTS (x64). All requirenments for REP were installed successfully (from requirenments.txt)

    bug 
    opened by HolyBayes 9
  • FoldingClassifier: KFold vs StratifiedKFold

    FoldingClassifier: KFold vs StratifiedKFold

    Hey,

    first of all a compliment: I really like your repo and I build a lot of code on it, it's so useful! About the FoldingClassifier: There was already a request to implement the StratifiedKFolding additionally to the "normal" KFolding. I would be very glad to see this but I'd even go a step further: why don't you completely replace the KFold with a StratifiedKFold?

    I think, from an ML point of view, it is always better (or, in best case, equally good) to use a stratified one. Using a normal KFolding only introduces different class-balances which (usually) result in "shifted" probabilities among the different classifier, whereas a stratified one does not and therefore makes each trained classifiers predictions "comparable".

    Or in other words: I cannot think of any case where you want to have a non-stratified KFolding instead of a stratified one.

    What do you think?

    Best, Mayou

    enhancement 
    opened by jonas-eschle 5
  • Support for build on hosted on (ana)conda

    Support for build on hosted on (ana)conda

    I see that some of the continuous integration scripts support conda builds, although not all the dependencies are installed this way. Is there any hope of seeing a build on conda soon for Linux x86_64 systems?

    The reason I ask is that I have accounts on numerous batch systems, none of which I have root access or have any way to use docker. They're all linux-based though, as is the norm. So far as I know, this is the case for many researchers.

    It'd be great to see a way to quickly install REP on these systems. This would:

    • Cut down on the time needed to introduce people to REP
    • Hook into the environment management and environment logging provided by conda
    • Easily and quickly deploy REP on supercomputing nodes while requiring little of their filesystem

    This is especially useful for ensuring the ROOT install is sane. I know there has already been a lot of work in the direction of making REP easy to access and install. Perhaps this could be a healthy addition?

    question 
    opened by ewengillies 5
  • Add ability to initialise FoldingBase objects with external parser

    Add ability to initialise FoldingBase objects with external parser

    If you would like to run rep with eg a StratifiedKFold instead of a normal KFold, this will be possible after the pull request. If no external folder-object is parsed, the default KFold algorithm is used.

    opened by mschlupp 5
  • test_xgboost file is not running on windows 10

    test_xgboost file is not running on windows 10

    test_xgboost file is not running on windows 10 File "c:\Sander\my_code\rep-master\tests\test_xgboost.py", line 4, in from rep.estimators import XGBoostClassifier, XGBoostRegressor

    ImportError: cannot import name XGBoostClassifier

    when rep installatoin is ok but xgboost instal fails Microsoft Windows Version 10.0.10586 2015 Microsoft Corporation. All rights reserved.

    c:\Sander>pip install rep --no-dependencies Collecting rep Downloading rep-0.6.5.tar.gz (72kB) 100% |################################| 81kB 511kB/s Building wheels for collected packages: rep Running setup.py bdist_wheel for rep ... done Stored in directory: C:\Users\Sander\AppData\Local\pip\Cache\wheels\db\ee\06\ac6e3f3ec208edaee29654f0b55ffaf2719a51de799c396b91 Successfully built rep Installing collected packages: rep Successfully installed rep-0.6.5 You are using pip version 8.1.0, however version 8.1.2 is available. You should consider upgrading via the 'python -m pip install --upgrade pip' command.

    c:\Sander>pip install xgboost==0.4a30 Collecting xgboost==0.4a30 Downloading xgboost-0.4a30.tar.gz (753kB) 100% |################################| 757kB 553kB/s No files/directories in c:\users\sander\appdata\local\temp\pip-build-exobfm\xgboost\pip-egg-info (from PKG-INFO) You are using pip version 8.1.0, however version 8.1.2 is available. You should consider upgrading via the 'python -m pip install --upgrade pip' command.

    c:\Sander>

    opened by Sandy4321 5
  • Manual Install on Windows

    Manual Install on Windows

    Hi! Is there a way to install REP manually on Windows environment? When installing dependencies i get an error when installing gnureadline:

    Error: this module is not meant to work on Windows (try pyreadline instead)

    Is there a way to use pyreadline for windows uoosers?

    wontfix 
    opened by funkindy 4
  • Mac OS instalation with docker

    Mac OS instalation with docker

    It seems last docker release depricates boot2docker http://docs.docker.com/installation/mac/ "This release of Docker deprecates the Boot2Docker command line in favor of Docker Machine"

    How to install REP with latest docker release?

    opened by pupadupa 4
  • test failed

    test failed

    after python setup.py install I run cd tests ; nosetests . it runs for long time and ends up with errors:

    ..Info in <TCanvas::Print>: png file /tmp/tmpBg1dar.png has been created
    Error in <TFile::TFile>: file toy_datasets/toyMC_bck_mass.root does not exist
    E..E.
    ======================================================================
    ERROR: tests.z_test_notebook.test_notebooks_in_folder('/root/rep/howto/00-intro-ROOT.ipynb',)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
        self.test(*self.arg)
      File "/root/rep/rep/test/test_notebooks.py", line 43, in check_single_notebook
        raise RuntimeError(description)
    RuntimeError: Cell failed: 'T.Draw("min_DOCA")
    c1'
    
     Traceback:
    ---------------------------------------------------------------------------
    ReferenceError                            Traceback (most recent call last)
    <ipython-input-5-aa6c7320180d> in <module>()
    ----> 1 T.Draw("min_DOCA")
          2 c1
    
    ReferenceError: attempt to access a null-pointer
    

    What am I missing?

    opened by anaderi 3
  • Updating numpy in 0.6.6 docker breaks matplotlib

    Updating numpy in 0.6.6 docker breaks matplotlib

    % docker run -ti yandex/rep:0.6.6 bash -lc 'pip install -U numpy; python -c "from matplotlib import pyplot as plt; plt.figure()"'
    Activate: ROOT has been sourced. Environment settings are ready.
    ROOTSYS=/root/miniconda/envs/rep_py2
    Deactivate:Unsetting ROOT environment variables..
    Activate: ROOT has been sourced. Environment settings are ready.
    ROOTSYS=/root/miniconda/envs/rep_py2
    Collecting numpy
      Downloading numpy-1.11.2-cp27-cp27mu-manylinux1_x86_64.whl (15.3MB)
        100% |################################| 15.3MB 46kB/s
    Installing collected packages: numpy
      Found existing installation: numpy 1.10.4
        DEPRECATION: Uninstalling a distutils installed project (numpy) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
        Uninstalling numpy-1.10.4:
          Successfully uninstalled numpy-1.10.4
    Successfully installed numpy-1.11.2
    /root/miniconda/envs/rep_py2/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
      warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
    bash: line 1:   222 Illegal instruction     python -c "from matplotlib import pyplot as plt; plt.figure()"
    
    opened by sashabaranov 2
  • do we need to measure fit/predict time without %time?

    do we need to measure fit/predict time without %time?

    it is useful if jupyter frontend disconnects during fit/predict execution.

    might the following snippet be handy for such cases

    class Stopwatch(object):
        def __enter__(self):
            self.t0 = datetime.datetime.now()
            return self
    
        def __exit__(self, type, value, traceback):
            self.t1 = datetime.datetime.now()
    
        def __repr__(self):
            return "delta: (%s)" % (self.t1 - self.t0)
    
    
    with Stopwatch() as sfit:
        time.sleep(1)
    with Stopwatch() as spredict:
        time.sleep(1)
    
    print "fit:", sfit, "spredict:", spredict
    
    opened by anaderi 2
  • New REP docker version running in /var/lib/docker/volumes/ instead of ~/rep_container

    New REP docker version running in /var/lib/docker/volumes/ instead of ~/rep_container

    Hi.

    I had old REP docker version in ~/rep_container which started with run.sh script on 8080 port. I updated REP and it broke: sudo $REPDIR/run.sh worked, but I couldn't connect to localhost:8080 (connection refused). I've decided to update docker and REP according to new instructions: https://github.com/yandex/rep/wiki/Install-REP-with-Docker-(Linux).

    1. I installed Docker, according to instructions.
    2. netstat -anl | grep 8888 gave empty result
    3. git checkout https://github.com/yandex/rep.git didn't work (pathspec did not match any file(s) known to git), so I used git clone instead.
    4. First run of sudo make run was successful and installed container.
    5. I rebooted and second sudo make run gave the following

    docker run -ti --rm -p 8888:8888 --name rep yandex/rep:0.6.4
    Error response from daemon: Conflict. The name "rep" is already in use by container 3af0884aeedb. You have to remove (or rename) that container to be able to reuse that name. make: *
    * [run] Error 1* 6. I ran sudo docker images

    REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE yandex/rep 0.6.4 18a48bc5a3b6 8 hours ago 2.635 GB anaderi/rep latest 63c3db2850b6 4 months ago 1.649 GB 91c95931e552 7 months ago 910 B 7. I tried sudo docker start rep. It worked and I opned REP on localhost:8888. But its working folder changed. Now it is /var/lib/docker/volumes/dbcc7ff99538007d9c6b244fb6b8f03bdcfd564f6076b36d79fa3330d2041107/_data/. It is quite unhandy, because it requires superuser rights to access and not conveniently located at all.

    Question: Is it a new system or did I something wrong? If latter, how to I fix it and run REP container in handy folder?

    opened by lodurality 2
  • Bump notebook from 4.2.1 to 6.4.12

    Bump notebook from 4.2.1 to 6.4.12

    Bumps notebook from 4.2.1 to 6.4.12.

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Update lib

    Update lib

    Issue:

    ModuleNotFoundError Traceback (most recent call last) in 5 from sklearn.ensemble import HistGradientBoostingClassifier 6 from rep.report.metrics import RocAuc ----> 7 from rep.metaml import GridOptimalSearchCV, FoldingScorer, RandomParameterOptimizer 8 from rep.estimators import SklearnClassifier

    ~/.local/lib/python3.8/site-packages/rep/metaml/init.py in 2 3 from .factory import ClassifiersFactory, RegressorsFactory ----> 4 from .folding import FoldingClassifier, FoldingRegressor 5 from .gridsearch import GridOptimalSearchCV 6 from .stacking import FeatureSplitter

    ~/.local/lib/python3.8/site-packages/rep/metaml/folding.py in 11 12 from sklearn import clone ---> 13 from sklearn.cross_validation import KFold 14 from sklearn.utils import check_random_state 15 from . import utils

    ModuleNotFoundError: No module named 'sklearn.cross_validation'

    Correction suggested based on https://stackoverflow.com/questions/30667525/importerror-no-module-named-sklearn-cross-validation

    opened by RobsonRocha 1
  • Bump requests from 2.9.1 to 2.20.0

    Bump requests from 2.9.1 to 2.20.0

    Bumps requests from 2.9.1 to 2.20.0.

    Changelog

    Sourced from requests's changelog.

    2.20.0 (2018-10-18)

    Bugfixes

    • Content-Type header parsing is now case-insensitive (e.g. charset=utf8 v Charset=utf8).
    • Fixed exception leak where certain redirect urls would raise uncaught urllib3 exceptions.
    • Requests removes Authorization header from requests redirected from https to http on the same hostname. (CVE-2018-18074)
    • should_bypass_proxies now handles URIs without hostnames (e.g. files).

    Dependencies

    • Requests now supports urllib3 v1.24.

    Deprecations

    • Requests has officially stopped support for Python 2.6.

    2.19.1 (2018-06-14)

    Bugfixes

    • Fixed issue where status_codes.py's init function failed trying to append to a __doc__ value of None.

    2.19.0 (2018-06-12)

    Improvements

    • Warn user about possible slowdown when using cryptography version < 1.3.4
    • Check for invalid host in proxy URL, before forwarding request to adapter.
    • Fragments are now properly maintained across redirects. (RFC7231 7.1.2)
    • Removed use of cgi module to expedite library load time.
    • Added support for SHA-256 and SHA-512 digest auth algorithms.
    • Minor performance improvement to Request.content.
    • Migrate to using collections.abc for 3.7 compatibility.

    Bugfixes

    • Parsing empty Link headers with parse_header_links() no longer return one bogus entry.
    ... (truncated)
    Commits
    • bd84045 v2.20.0
    • 7fd9267 remove final remnants from 2.6
    • 6ae8a21 Add myself to AUTHORS
    • 89ab030 Use comprehensions whenever possible
    • 2c6a842 Merge pull request #4827 from webmaven/patch-1
    • 30be889 CVE URLs update: www sub-subdomain no longer valid
    • a6cd380 Merge pull request #4765 from requests/encapsulate_urllib3_exc
    • bbdbcc8 wrap url parsing exceptions from urllib3's PoolManager
    • ff0c325 Merge pull request #4805 from jdufresne/https
    • b0ad249 Prefer https:// for URLs throughout project
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Changes to TMVA API in new ROOT versions break TMVAClassifier

    Changes to TMVA API in new ROOT versions break TMVAClassifier

    Hi all,

    first of all, I wanted to thank and compliment the developers for this brilliant library. I finally had the chance to start playing with it today, but I was stopped in my tracks when trying to use a TMVAClassifier:

    AssertionError: ERROR: TMVA process is incorrect finished 
     LOG: None 
     Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/ludo/miniconda3/envs/pyroot/lib/python2.7/site-packages/rep/estimators/_tmvaFactory.py", line 86, in main
        tmva_process(classifier, info, data, labels, sample_weight)
      File "/home/ludo/miniconda3/envs/pyroot/lib/python2.7/site-packages/rep/estimators/_tmvaFactory.py", line 40, in tmva_process
        factory.AddVariable(var)
    AttributeError: 'Factory' object has no attribute 'AddVariable'
    

    My ROOT/TMVA versions are:

    You are running ROOT Version: 6.08/00, Nov 4, 2016
    TMVA Version 4.2.1, Feb 5, 2015
    

    Searching the web for this error message led me to this post on the ROOT forum: https://root-forum.cern.ch/t/25090, where the cause of problem is indicated as being due to a breaking change in the TMVA API:

    In recent ROOT versions (6.06 or 6.08, don't remember exactly), the TMVA interface has changed. You need to create a TMVA::DataLoader and call AddVariable on the dataloader object.

    As I understand, this is related to what was mentioned by @gandreassi in a comment to #104. Any idea on how complicated it would be to adapt tmva_process to the new interface?

    opened by fndari 1
Releases(0.6.6)
  • 0.6.6(Aug 9, 2016)

    • python2 and python3 dockers
    • updated libraries
    • added CacheClassifier
    • minimized size of docker image, simplified building process
    • some fixes for ML libraries
    • some documentation updates
    • deleted plot.ly
    • solved theanets reproducibility
    Source code(tar.gz)
    Source code(zip)
  • 0.6.5(Feb 3, 2016)

    Fixes:

    • TMVA process correct termination
    • TMVA fix for MAX OS El Capitan (problems with dynamic libraries paths)
    • fix travis (show not passed tests, create docker on dockerhub)
    • fix wget in notebooks
    • fix errors calculation in efficiencies (for flatness property)
    • added Makefile
    • fix normalization in the multi dimentional metric
    Source code(tar.gz)
    Source code(zip)
  • 0.6.4(Nov 21, 2015)

    • Add continuous integration
    • Python 3 support
    • Conda installation in docker and travis
    • Kitematic-friendly docker
    • Update all libraries versions
    • added Folding Regressor, added feature importances for folding
    • added minimization to gridsearch, added random gridsearch from distributions
    • added folding scorer for regressor to gridsearch
    • faster tests
    • updated notebooks
    • Fixes:
      • tmva termination
      • documentation for grid search
      • Gridsearch bugs with metrics (metric fit)
      • learning curve with mask for folding
    Source code(tar.gz)
    Source code(zip)
  • 0.6.3(Jul 30, 2015)

  • 0.6.2(Jul 6, 2015)

    • Support of neural networks in common interface:

      • theanets
      • neurolab
      • pybrain

      Now all the REP stuff is available for classifiers and regressors from these libraries:

      • usage inside sklearn pipeline
      • grid_search for hyper parameter optimization
      • reports, parallel training on cluster
    • New lovely documentation, check it out!

    • Fixes in metaclassifiers connected with usage of expressions-as-features

    • Rewritten FeatureSplitter

    • Switched to sklearn 0.16

    • New method train_test_split_group - splitting into train and test by the value of special column. Samples with same values are either both in train or both in test.

    • Update howto/notebooks with new open physical datasets

    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(May 22, 2015)

    • Tmva implementation enhancement with root_numpy https://github.com/yandex/rep/issues/2.
    • Add FPRatTPR (return fpr value at fixed tpr) and TPRatFPR (return tpr value at fixed fpr) metrics, which are required, e.g. for tuning online triggering system. Moreover learning curves are available for these metrics now.
    • Many improvements in documentation.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(May 12, 2015)

    • unified classifiers wrapper for variety of implementations: TMVA, Sklearn, XGBoost, uBoost
    • parallel training of classifiers on cluster
    • classification/regression reports with plots
    • support of interactive plots (bokeh, plotly)
    • grid-search with parallelized execution on a cluster
    • git, versioning of research
    • computation of different classification metrics
    • partial support of python 3.
    Source code(tar.gz)
    Source code(zip)
Owner
Yandex
Yandex open source projects and technologies
Yandex
30 Days Of Machine Learning Using Pytorch

Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

Mayur 119 Nov 24, 2022
Bonsai: Gradient Boosted Trees + Bayesian Optimization

Bonsai is a wrapper for the XGBoost and Catboost model training pipelines that leverages Bayesian optimization for computationally efficient hyperparameter tuning.

24 Oct 27, 2022
This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

1 Nov 03, 2021
PyHarmonize: Adding harmony lines to recorded melodies in Python

PyHarmonize: Adding harmony lines to recorded melodies in Python About To use this module, the user provides a wav file containing a melody, the key i

Julian Kappler 2 May 20, 2022
It is a forest of random projection trees

rpforest rpforest is a Python library for approximate nearest neighbours search: finding points in a high-dimensional space that are close to a given

Lyst 211 Dec 29, 2022
A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

Nicholas Monath 31 Nov 03, 2022
ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in

Computational Data Science Lab 182 Dec 31, 2022
pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM

pywFM pywFM is a Python wrapper for Steffen Rendle's libFM. libFM is a Factorization Machine library: Factorization machines (FM) are a generic approa

João Ferreira Loff 251 Sep 23, 2022
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Ross Taylor 2k Jan 02, 2023
XManager: A framework for managing machine learning experiments 🧑‍🔬

XManager is a platform for packaging, running and keeping track of machine learning experiments. It currently enables one to launch experiments locally or on Google Cloud Platform (GCP). Interaction

DeepMind 620 Dec 27, 2022
Traingenerator 🧙 A web app to generate template code for machine learning ✨

Traingenerator 🧙 A web app to generate template code for machine learning ✨ 🎉 Traingenerator is now live! 🎉

Johannes Rieke 1.2k Jan 07, 2023
💀mummify: a version control tool for machine learning

mummify is a version control tool for machine learning. It's simple, fast, and designed for model prototyping.

Max Humber 43 Jul 09, 2022
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

27 Aug 19, 2022
Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

2.3k Jan 04, 2023
Scikit-Learn useful pre-defined Pipelines Hub

Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in

Rodrigo Arenas 1 Apr 26, 2022
Client - 🔥 A tool for visualizing and tracking your machine learning experiments

Weights and Biases Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to produ

Weights & Biases 5.2k Jan 03, 2023
An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

Shauli Ravfogel 4 Apr 13, 2022
onelearn: Online learning in Python

onelearn: Online learning in Python Documentation | Reproduce experiments | onelearn stands for ONE-shot LEARNning. It is a small python package for o

15 Nov 06, 2022
Transform ML models into a native code with zero dependencies

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Bayes' Witnesses 2.3k Jan 03, 2023
UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base l

Booking.com 254 Dec 31, 2022