Transform ML models into a native code with zero dependencies

Overview

m2cgen

GitHub Actions Status Coverage Status License: MIT Python Versions PyPI Version Downloads

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code (Python, C, Java, Go, JavaScript, Visual Basic, C#, PowerShell, R, PHP, Dart, Haskell, Ruby, F#, Rust).

Installation

Supported Python version is >= 3.6.

pip install m2cgen

Supported Languages

  • C
  • C#
  • Dart
  • F#
  • Go
  • Haskell
  • Java
  • JavaScript
  • PHP
  • PowerShell
  • Python
  • R
  • Ruby
  • Rust
  • Visual Basic (VBA-compatible)

Supported Models

Classification Regression
Linear
  • scikit-learn
    • LogisticRegression
    • LogisticRegressionCV
    • PassiveAggressiveClassifier
    • Perceptron
    • RidgeClassifier
    • RidgeClassifierCV
    • SGDClassifier
  • lightning
    • AdaGradClassifier
    • CDClassifier
    • FistaClassifier
    • SAGAClassifier
    • SAGClassifier
    • SDCAClassifier
    • SGDClassifier
  • scikit-learn
    • ARDRegression
    • BayesianRidge
    • ElasticNet
    • ElasticNetCV
    • GammaRegressor
    • HuberRegressor
    • Lars
    • LarsCV
    • Lasso
    • LassoCV
    • LassoLars
    • LassoLarsCV
    • LassoLarsIC
    • LinearRegression
    • OrthogonalMatchingPursuit
    • OrthogonalMatchingPursuitCV
    • PassiveAggressiveRegressor
    • PoissonRegressor
    • RANSACRegressor(only supported regression estimators can be used as a base estimator)
    • Ridge
    • RidgeCV
    • SGDRegressor
    • TheilSenRegressor
    • TweedieRegressor
  • StatsModels
    • Generalized Least Squares (GLS)
    • Generalized Least Squares with AR Errors (GLSAR)
    • Generalized Linear Models (GLM)
    • Ordinary Least Squares (OLS)
    • [Gaussian] Process Regression Using Maximum Likelihood-based Estimation (ProcessMLE)
    • Quantile Regression (QuantReg)
    • Weighted Least Squares (WLS)
  • lightning
    • AdaGradRegressor
    • CDRegressor
    • FistaRegressor
    • SAGARegressor
    • SAGRegressor
    • SDCARegressor
    • SGDRegressor
SVM
  • scikit-learn
    • LinearSVC
    • NuSVC
    • OneClassSVM
    • SVC
  • lightning
    • KernelSVC
    • LinearSVC
  • scikit-learn
    • LinearSVR
    • NuSVR
    • SVR
  • lightning
    • LinearSVR
Tree
  • DecisionTreeClassifier
  • ExtraTreeClassifier
  • DecisionTreeRegressor
  • ExtraTreeRegressor
Random Forest
  • ExtraTreesClassifier
  • LGBMClassifier(rf booster only)
  • RandomForestClassifier
  • XGBRFClassifier
  • ExtraTreesRegressor
  • LGBMRegressor(rf booster only)
  • RandomForestRegressor
  • XGBRFRegressor
Boosting
  • LGBMClassifier(gbdt/dart/goss booster only)
  • XGBClassifier(gbtree(including boosted forests)/gblinear booster only)
    • LGBMRegressor(gbdt/dart/goss booster only)
    • XGBRegressor(gbtree(including boosted forests)/gblinear booster only)

    You can find versions of packages with which compatibility is guaranteed by CI tests here. Other versions can also be supported but they are untested.

    Classification Output

    Linear / Linear SVM / Kernel SVM

    Binary

    Scalar value; signed distance of the sample to the hyperplane for the second class.

    Multiclass

    Vector value; signed distance of the sample to the hyperplane per each class.

    Comment

    The output is consistent with the output of LinearClassifierMixin.decision_function.

    SVM

    Outlier detection

    Scalar value; signed distance of the sample to the separating hyperplane: positive for an inlier and negative for an outlier.

    Binary

    Scalar value; signed distance of the sample to the hyperplane for the second class.

    Multiclass

    Vector value; one-vs-one score for each class, shape (n_samples, n_classes * (n_classes-1) / 2).

    Comment

    The output is consistent with the output of BaseSVC.decision_function when the decision_function_shape is set to ovo.

    Tree / Random Forest / Boosting

    Binary

    Vector value; class probabilities.

    Multiclass

    Vector value; class probabilities.

    Comment

    The output is consistent with the output of the predict_proba method of DecisionTreeClassifier / ExtraTreeClassifier / ExtraTreesClassifier / RandomForestClassifier / XGBRFClassifier / XGBClassifier / LGBMClassifier.

    Usage

    Here's a simple example of how a linear model trained in Python environment can be represented in Java code:

    from sklearn.datasets import load_boston
    from sklearn import linear_model
    import m2cgen as m2c
    
    boston = load_boston()
    X, y = boston.data, boston.target
    
    estimator = linear_model.LinearRegression()
    estimator.fit(X, y)
    
    code = m2c.export_to_java(estimator)

    Generated Java code:

    public class Model {
    
        public static double score(double[] input) {
            return (((((((((((((36.45948838508965) + ((input[0]) * (-0.10801135783679647))) + ((input[1]) * (0.04642045836688297))) + ((input[2]) * (0.020558626367073608))) + ((input[3]) * (2.6867338193449406))) + ((input[4]) * (-17.76661122830004))) + ((input[5]) * (3.8098652068092163))) + ((input[6]) * (0.0006922246403454562))) + ((input[7]) * (-1.475566845600257))) + ((input[8]) * (0.30604947898516943))) + ((input[9]) * (-0.012334593916574394))) + ((input[10]) * (-0.9527472317072884))) + ((input[11]) * (0.009311683273794044))) + ((input[12]) * (-0.5247583778554867));
        }
    }

    You can find more examples of generated code for different models/languages here.

    CLI

    m2cgen can be used as a CLI tool to generate code using serialized model objects (pickle protocol):

    $ m2cgen 
       
       
        
         --language 
        
        
         
          [--indent 
         
         
          
          ] [--function_name 
          
          
           
           ]
             [--class_name 
           
           
             ] [--module_name 
            
              ] [--package_name 
             
               ] [--namespace 
              
                ] [--recursion-limit 
               
                 ] 
                
               
              
             
           
          
          
         
         
        
        
       
       

    Don't forget that for unpickling serialized model objects their classes must be defined in the top level of an importable module in the unpickling environment.

    Piping is also supported:

    $ cat 
       
       
        
         | m2cgen --language 
        
        
    
        
        
       
       

    FAQ

    Q: Generation fails with RecursionError: maximum recursion depth exceeded error.

    A: If this error occurs while generating code using an ensemble model, try to reduce the number of trained estimators within that model. Alternatively you can increase the maximum recursion depth with sys.setrecursionlimit( ) .

    Q: Generation fails with ImportError: No module named error while transpiling model from a serialized model object.

    A: This error indicates that pickle protocol cannot deserialize model object. For unpickling serialized model objects, it is required that their classes must be defined in the top level of an importable module in the unpickling environment. So installation of package which provided model's class definition should solve the problem.

    Q: Generated by m2cgen code provides different results for some inputs compared to original Python model from which the code were obtained.

    A: Some models force input data to be particular type during prediction phase in their native Python libraries. Currently, m2cgen works only with float64 (double) data type. You can try to cast your input data to another type manually and check results again. Also, some small differences can happen due to specific implementation of floating-point arithmetic in a target language.

    Comments
    • Code generated for XGBoost models returns invalid scores when tree_method is set to

      Code generated for XGBoost models returns invalid scores when tree_method is set to "hist"

      I have trained xgboost models in Python and am using the CLI interface to convert the serialized models to pure python. However, when I use the pure python, the results differ from the predictions using the model directly.

      Python 3.7 xgboost 0.90

      My model has a large number of parameters (somewhat over 500). Here are predicted class probabilities from the original model: image

      Here are the same predicted probabilities using the generated python code via m2cgen: image

      We can see that the results are similar but not the same. The result is a significant number of cases that are moved into different classes between the two sets of predictions.

      I have also tested this with binary classification models and have the same issues.

      opened by eafpres 21
    • In Java interpreter ignore subroutines and perform code split based on the AST size

      In Java interpreter ignore subroutines and perform code split based on the AST size

      After investigating possible solutions for https://github.com/BayesWitnesses/m2cgen/issues/152, I came to a conclusion that with the existing design it's extremely hard to come up with the optimal algorithm to split code into subroutines on the interpreter side (and not in assemblers). The primary reason for that is that since we always interpret one expression at a time it's hard to predict both the depth of the current subtree and the number of expressions that are left to interpret in other branches. I've achieved some progress by splitting expressions into separate subroutines based on the size of the code generated so far (i.e. code size threshold), but more often than not I'll get some stupid subroutines like this one:

      public static double subroutine2(double[] input) {
          return 22.640634908349323;
      }
      

      That's why I took a simpler approach and attempted to optimize an interpreter that caused trouble in the first place - the R one. I slightly modified its behavior: when the binary expressions count threshold is exceeded, it no longer split them into separate variable assignments, but moves them into their own subroutines. Although it might not be the most optimal way for simpler models (like linear ones), it helps tremendously with gradient boosting and random forest models. Since those models are summation of independent estimators, we end up putting every N (5 by default) estimators into their own subroutine, improving this way the execution time. @StrikerRUS please let me know what you think.

      opened by izeigerman 14
    • added possibility to write generated code into file

      added possibility to write generated code into file

      Closed #110.

      Real-life frustrating example:

      import sys
      
      from sklearn.datasets import load_boston
      
      import lightgbm as lgb
      import m2cgen as m2c
      
      X, y = load_boston(True)
      est = lgb.LGBMRegressor(n_estimators=1000).fit(X, y)
      
      sys.setrecursionlimit(1<<30)
      print(m2c.export_to_python(est))
      
      IOPub data rate exceeded.
      The notebook server will temporarily stop sending output
      to the client in order to avoid crashing it.
      To change this limit, set the config variable
      `--NotebookApp.iopub_data_rate_limit`.
      
      Current values:
      NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
      NotebookApp.rate_limit_window=3.0 (secs)
      

      m2c.export_to_python(est, 'test.txt') works fine in this scenario.

      opened by StrikerRUS 12
    • Dart language support

      Dart language support

      For those building Flutter apps that would like to be able to utilize static models trained in scikit on-device, this tool would be a perfect fit. And if the Flutter dev team decides to add a hot code push feature to the framework, models from m2cgen could be updated on the fly.

      opened by mattc-eostar 11
    • added support for PowerShell

      added support for PowerShell

      With this PR Windows users will be able to execute ML models from "command line" without the need to install any programming language (PowerShell is already installed in Windows).

      opened by StrikerRUS 11
    • Handle missing values replacement in LightGBM

      Handle missing values replacement in LightGBM

      Sometimes exported LGBMRegressor model's prediction doesn't match predictions from the original model. This happens when model encounters values missing during training. More detailed discussion could be found here https://github.com/microsoft/LightGBM/issues/2921

      This is by no means a complete fix for the problem, it only addresses this part of the LightGBM behavior: "for numerical features, if not missing is seen in training, the missing value will be converted to zero, and then check it with the threshold. So it is not always the left side."

      Fix has also being tested on fairly big regression model with numerical features and it works as expected.

      How to reproduce:

      import numpy as np
      import lightgbm as lgb
      import m2cgen as m2c
      from sklearn.datasets import load_diabetes
      
      dataset = load_diabetes()
      
      gbm = lgb.LGBMRegressor(num_leaves=51,
                              learning_rate=0.05,
                              n_estimators=100)
      gbm.fit(dataset['data'], dataset['target'])
      
      
      test = np.array([-2.175, 0.797, np.NaN, 1.193, 0.0, 0.0, 0.0, np.NaN, np.NaN, np.NaN])
      
      print(gbm.predict(np.array([test]))[0])
      
      code = m2c.export_to_python(gbm)
      
      with open('model.py', 'w') as fp:
          fp.write(code)
      
      import model as m
      
      print(m.score(test))
      
      opened by Aulust 10
    • Code generated from XGBoost model includes

      Code generated from XGBoost model includes "None"

      When transpiling XGBRegressor and XGBClassifier models such as the following basic example:

      from xgboost import XGBRegressor
      from sklearn import datasets
      import m2cgen as m2c
      
      iris_data = datasets.load_iris(return_X_y=True)
      
      mod = XGBRegressor(booster="gblinear", max_depth=2)
      X, y = iris_data
      mod.fit(X[:120], y[:120])
      
      code = m2c.export_to_c(mod)
      
      print(code)
      

      the resulting c-code includes a Pythonesque None :

      double score(double * input) {
          return (None) + (((((-0.391196) + ((input[0]) * (-0.0196191))) + ((input[1]) * (-0.11313))) + ((input[2]) * (0.137024))) + ((input[3]) * (0.645197)));
      }
      

      Probably I am missing some basic step?

      opened by robinvanemden 10
    • added Visual Basic code generator

      added Visual Basic code generator

      The motivation behind this PR is allowing users with poor programming skills access to strong ML models inside Office applications (mainly in Excel).

      Also, if I'm not mistaken, VBA projects can be used in SOLIDWORKS.

      After merging this PR users will be able to use ML models inside Excel in the following way.

      Usage Example

      As usual, generate a model via supported ML algorithm:

      from sklearn.datasets import load_boston
      from sklearn.svm import SVR
      
      import m2cgen as m2c
      
      X, y = load_boston(True)
      X = X[:4, :2]
      y = y[:4]
      
      reg = SVR()
      reg.fit(X, y)
      

      After that output VBA code representation of the model via the m2cgen Python package:

      print(m2c.export_to_vba(reg))
      
      Function score(ByRef input_vector() As Double) As Double
          Dim var0 As Double
          var0 = (0) - (0.3333333333333333)
          score = ((((28.70000000001455) + ((Exp((var0) * (((Application.WorksheetFunction.Power((0.00632) - (input_vector(0)), 2)) + (Application.WorksheetFunction.Power((18.0) - (input_vector(1)), 2))) + (Application.WorksheetFunction.Power((2.31) - (input_vector(2)), 2))))) * (-1.0))) + ((Exp((var0) * (((Application.WorksheetFunction.Power((0.02731) - (input_vector(0)), 2)) + (Application.WorksheetFunction.Power((0.0) - (input_vector(1)), 2))) + (Application.WorksheetFunction.Power((7.07) - (input_vector(2)), 2))))) * (-1.0))) + ((Exp((var0) * (((Application.WorksheetFunction.Power((0.02729) - (input_vector(0)), 2)) + (Application.WorksheetFunction.Power((0.0) - (input_vector(1)), 2))) + (Application.WorksheetFunction.Power((7.07) - (input_vector(2)), 2))))) * (1.0))) + ((Exp((var0) * (((Application.WorksheetFunction.Power((0.03237) - (input_vector(0)), 2)) + (Application.WorksheetFunction.Power((0.0) - (input_vector(1)), 2))) + (Application.WorksheetFunction.Power((2.18) - (input_vector(2)), 2))))) * (1.0))
      End Function
      

      Create empty Visual Basic file example_module.bas and paste the copied output there.

      Now open Excel, enable Developer tab and click Developer -> Visual Basic (Alt + F11). In VBA editor click File -> Import File and choose previously created example_module.bas file.

      After doing that, one more required action is writing a proxy function that will convert Excel Range object to Array and call the model. For instance, such function for regression, for row-based features placed inside Excel can be:

      Function SCOREROW(features As Range) As Double
          Dim arr() As Double
          ReDim Preserve arr(features.Columns.Count - 1)
          Dim i As Integer
          For i = 0 To UBound(arr)
              arr(i) = features(1, i + 1)
          Next i
          SCOREROW = score(arr)
      End Function
      

      Now this proxy function can be used on Excel sheet as any built-in Excel functions:

      image

      Let's compare Excel predictions with ones from the native Python model:

      reg.predict(X)
      
      array([27.7       , 28.70034543, 28.70034543, 29.7       ])
      

      Seems that everything is fine!

      opened by StrikerRUS 10
    • Fix #168. Enforce float32 type for split condition values for GBT models created using XGBoost

      Fix #168. Enforce float32 type for split condition values for GBT models created using XGBoost

      As it turns out the issue reported in https://github.com/BayesWitnesses/m2cgen/issues/168 is not unique to the "hist" tree construction algorithm. It seems that with "hist" method the likelihood of reprdocue is much higher due to relying on feature histograms. I was able to reproduce the same discrepancy with non-hist methods on a larger sample of test data.

      The issue occurs due to a double precision error and reproduces every time when the feature value matches the split condition in one of the tree's nodes.

      Example: feature value = 0.671, split condition = 0.671000004. When we hit this condition in the generated code the outcome of 0.671 < 0.671000004 is "true" (or "yes" branch). While in XGBoost the same condition leads to the "no" branch.

      After some investigation I noticed that the XGBoost's DMatrix forces all values to be float32 (https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L565). At the same time in our assemblers we rely on default 64-bit floats. Forcing the split condition to be float32 seem to address the issue. At least I couldn't reproduce it so far.

      opened by izeigerman 9
    • add option to save generated code into file

      add option to save generated code into file

      I'm sorry if I missed this functionality, but CLI version hasn't it for sure (I saw the related code only in generate_code_examples.py). I guess it will be very useful to eliminate copy-paste phase, especially for large models.

      Of course, piping is a solution, but not for development in Jupyter Notebook, for example.

      enhancement good first issue 
      opened by StrikerRUS 9
    • add: Make function_name parametrized

      add: Make function_name parametrized

      Hello everyone,

      First of all, thanks a ton for putting this tool/library together -- especially in resource-stranded environments, it does have a potential to literally save lives!

      One small problem I was fighting with while using it was the score function it uses in the generated modules. When they are used as drop-in replacements for trained models, using score is a bit strange, as the API generally provides function like predict or predict_proba. It would therefore be of great help to me if this name could be dynamically changed and I would not have to do so manually.

      Please do let me know if something like this sounds like a sensible addition. I'd be happy to update the code so that it reflect your vision, so please feel free to just let me know whenever that may be the case.

      Thanks!


      • Currently m2cgen generates a module in various languages that has a "score"/"Score" function/method. This is not always desirable, as many of the trained modules that are to be exported may provide their prediction via API functions with different names (such as predict).

      • This commit adds a way of specifying the name of the function both via the CLI and in the exporters (that is, in the export_to_ funcitons) by specifying the function_name option/parameter while keeping the default set to "score"/"Score" for backwards compatilibity.

      Signed-off-by: mr.Shu [email protected]

      opened by mrshu 8
    • Bump lightgbm from 3.3.2 to 3.3.4

      Bump lightgbm from 3.3.2 to 3.3.4

      Bumps lightgbm from 3.3.2 to 3.3.4.

      Release notes

      Sourced from lightgbm's releases.

      v3.3.4

      Changes

      This is a special release, put up to prevent the R package from being archived on CRAN.

      See #5618 and #5619 for context.

      This release only contains the changes, relative to v3.3.3, necessary to prevent removal of the R package from CRAN.

      πŸ’‘ New Features

      None

      πŸ”¨ Breaking

      None

      πŸš€ Efficiency Improvement

      None

      πŸ› Bug Fixes

      πŸ“– Documentation

      None

      🧰 Maintenance

      v3.3.3

      Changes

      This is a special release, put up to prevent the R package from being archived on CRAN.

      See microsoft/LightGBM#5502 and microsoft/LightGBM#5525 for context.

      This release only contains the changes, relevant to v3.3.2, necessary to prevent removal of the R package from CRAN.

      πŸ’‘ New Features

      πŸ”¨ Breaking

      None

      ... (truncated)

      Commits
      • 8d68f34 fix detection of QEMU in pinning dependencies
      • 8431c38 remove pin on scikit-learn and skip all the load_boston() tests
      • b95c865 looser scikit-learn pin to try to get QEMU builds working
      • a47d7c7 fix QEMU
      • 70aa002 more pinning to old versions
      • 581a7fa fix numpy constraint
      • 940022c try capping python version
      • d721f4e ceiling on scikit-learn
      • cb9962a ceiling on dask too
      • caecafd try pinning dependencies
      • Additional commits viewable in compare view

      Dependabot compatibility score

      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


      Dependabot commands and options

      You can trigger Dependabot actions by commenting on this PR:

      • @dependabot rebase will rebase this PR
      • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
      • @dependabot merge will merge this PR after your CI passes on it
      • @dependabot squash and merge will squash and merge this PR after your CI passes on it
      • @dependabot cancel merge will cancel a previously requested merge and block automerging
      • @dependabot reopen will reopen this PR if it is closed
      • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
      dependencies 
      opened by dependabot[bot] 0
    • Bump scipy from 1.9.1 to 1.10.0

      Bump scipy from 1.9.1 to 1.10.0

      Bumps scipy from 1.9.1 to 1.10.0.

      Release notes

      Sourced from scipy's releases.

      SciPy 1.10.0 Release Notes

      SciPy 1.10.0 is the culmination of 6 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Before upgrading, we recommend that users check that their own code does not use deprecated SciPy functionality (to do so, run your code with python -Wd and check for DeprecationWarning s). Our development attention will now shift to bug-fix releases on the 1.10.x branch, and on adding new features on the main branch.

      This release requires Python 3.8+ and NumPy 1.19.5 or greater.

      For running on PyPy, PyPy3 6.0+ is required.

      Highlights of this release

      • A new dedicated datasets submodule (scipy.datasets) has been added, and is now preferred over usage of scipy.misc for dataset retrieval.
      • A new scipy.interpolate.make_smoothing_spline function was added. This function constructs a smoothing cubic spline from noisy data, using the generalized cross-validation (GCV) criterion to find the tradeoff between smoothness and proximity to data points.
      • scipy.stats has three new distributions, two new hypothesis tests, three new sample statistics, a class for greater control over calculations involving covariance matrices, and many other enhancements.

      New features

      scipy.datasets introduction

      • A new dedicated datasets submodule has been added. The submodules is meant for datasets that are relevant to other SciPy submodules ands content (tutorials, examples, tests), as well as contain a curated set of datasets that are of wider interest. As of this release, all the datasets from scipy.misc have been added to scipy.datasets (and deprecated in scipy.misc).
      • The submodule is based on Pooch (a new optional dependency for SciPy), a Python package to simplify fetching data files. This move will, in a subsequent release, facilitate SciPy to trim down the sdist/wheel sizes, by decoupling the data files and moving them out of the SciPy repository, hosting them externally and

      ... (truncated)

      Commits
      • dde5059 REL: 1.10.0 final [wheel build]
      • 7856f28 Merge pull request #17696 from tylerjereddy/treddy_110_final_prep
      • 205b624 DOC: add missing author
      • 1ab9f1b DOC: update 1.10.0 relnotes
      • ac2f45f MAINT: integrate._qmc_quad: mark as private with preceding underscore
      • 3e0ae1a REV: integrate.qmc_quad: delay release to SciPy 1.11.0
      • 34cdf05 MAINT: FFT pybind11 fixups
      • 843500a Merge pull request #17689 from mdhaber/gh17686
      • 089924b REL: integrate.qmc_quad: remove from release notes
      • 3e47110 REL: 1.10.0rc3 unreleased
      • Additional commits viewable in compare view

      Dependabot compatibility score

      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


      Dependabot commands and options

      You can trigger Dependabot actions by commenting on this PR:

      • @dependabot rebase will rebase this PR
      • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
      • @dependabot merge will merge this PR after your CI passes on it
      • @dependabot squash and merge will squash and merge this PR after your CI passes on it
      • @dependabot cancel merge will cancel a previously requested merge and block automerging
      • @dependabot reopen will reopen this PR if it is closed
      • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
      dependencies 
      opened by dependabot[bot] 0
    • Bump numpy from 1.23.3 to 1.24.1

      Bump numpy from 1.23.3 to 1.24.1

      Bumps numpy from 1.23.3 to 1.24.1.

      Release notes

      Sourced from numpy's releases.

      v1.24.1

      NumPy 1.24.1 Release Notes

      NumPy 1.24.1 is a maintenance release that fixes bugs and regressions discovered after the 1.24.0 release. The Python versions supported by this release are 3.8-3.11.

      Contributors

      A total of 12 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

      • Andrew Nelson
      • Ben Greiner +
      • Charles Harris
      • ClΓ©ment Robert
      • Matteo Raso
      • Matti Picus
      • Melissa Weber MendonΓ§a
      • Miles Cranmer
      • Ralf Gommers
      • Rohit Goswami
      • Sayed Adel
      • Sebastian Berg

      Pull requests merged

      A total of 18 pull requests were merged for this release.

      • #22820: BLD: add workaround in setup.py for newer setuptools
      • #22830: BLD: CIRRUS_TAG redux
      • #22831: DOC: fix a couple typos in 1.23 notes
      • #22832: BUG: Fix refcounting errors found using pytest-leaks
      • #22834: BUG, SIMD: Fix invalid value encountered in several ufuncs
      • #22837: TST: ignore more np.distutils.log imports
      • #22839: BUG: Do not use getdata() in np.ma.masked_invalid
      • #22847: BUG: Ensure correct behavior for rows ending in delimiter in...
      • #22848: BUG, SIMD: Fix the bitmask of the boolean comparison
      • #22857: BLD: Help raspian arm + clang 13 about __builtin_mul_overflow
      • #22858: API: Ensure a full mask is returned for masked_invalid
      • #22866: BUG: Polynomials now copy properly (#22669)
      • #22867: BUG, SIMD: Fix memory overlap in ufunc comparison loops
      • #22868: BUG: Fortify string casts against floating point warnings
      • #22875: TST: Ignore nan-warnings in randomized out tests
      • #22883: MAINT: restore npymath implementations needed for freebsd
      • #22884: BUG: Fix integer overflow in in1d for mixed integer dtypes #22877
      • #22887: BUG: Use whole file for encoding checks with charset_normalizer.

      Checksums

      ... (truncated)

      Commits
      • a28f4f2 Merge pull request #22888 from charris/prepare-1.24.1-release
      • f8fea39 REL: Prepare for the NumPY 1.24.1 release.
      • 6f491e0 Merge pull request #22887 from charris/backport-22872
      • 48f5fe4 BUG: Use whole file for encoding checks with charset_normalizer [f2py] (#22...
      • 0f3484a Merge pull request #22883 from charris/backport-22882
      • 002c60d Merge pull request #22884 from charris/backport-22878
      • 38ef9ce BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 (#22878)
      • bb00c68 MAINT: restore npymath implementations needed for freebsd
      • 64e09c3 Merge pull request #22875 from charris/backport-22869
      • dc7bac6 TST: Ignore nan-warnings in randomized out tests
      • Additional commits viewable in compare view

      Dependabot compatibility score

      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


      Dependabot commands and options

      You can trigger Dependabot actions by commenting on this PR:

      • @dependabot rebase will rebase this PR
      • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
      • @dependabot merge will merge this PR after your CI passes on it
      • @dependabot squash and merge will squash and merge this PR after your CI passes on it
      • @dependabot cancel merge will cancel a previously requested merge and block automerging
      • @dependabot reopen will reopen this PR if it is closed
      • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
      dependencies 
      opened by dependabot[bot] 0
    • Bump xgboost from 1.6.2 to 1.7.2

      Bump xgboost from 1.6.2 to 1.7.2

      Bumps xgboost from 1.6.2 to 1.7.2.

      Release notes

      Sourced from xgboost's releases.

      1.7.2 Patch Release

      v1.7.2 (2022 Dec 8)

      This is a patch release for bug fixes.

      • Work with newer thrust and libcudacxx (#8432)

      • Support null value in CUDA array interface namespace. (#8486)

      • Use getsockname instead of SO_DOMAIN on AIX. (#8437)

      • [pyspark] Make QDM optional based on a cuDF check (#8471)

      • [pyspark] sort qid for SparkRanker. (#8497)

      • [dask] Properly await async method client.wait_for_workers. (#8558)

      • [R] Fix CRAN test notes. (#8428)

      • [doc] Fix outdated document [skip ci]. (#8527)

      • [CI] Fix github action mismatched glibcxx. (#8551)

      Artifacts

      You can verify the downloaded packages by running this on your Unix shell:

      echo "<hash> <artifact>" | shasum -a 256 --check
      
      15be5a96e86c3c539112a2052a5be585ab9831119cd6bc3db7048f7e3d356bac  xgboost_r_gpu_linux_1.7.2.tar.gz
      0dd38b08f04ab15298ec21c4c43b17c667d313eada09b5a4ac0d35f8d9ba15d7  xgboost_r_gpu_win64_1.7.2.tar.gz
      

      1.7.1 Patch Release

      v1.7.1 (2022 November 3)

      This is a patch release to incorporate the following hotfix:

      • Add back xgboost.rabit for backwards compatibility (#8411)

      Release 1.7.0 stable

      Note. The source distribution of Python XGBoost 1.7.0 was defective (#8415). Since PyPI does not allow us to replace existing artifacts, we released 1.7.0.post0 version to upload the new source distribution. Everything in 1.7.0.post0 is identical to 1.7.0 otherwise.

      v1.7.0 (2022 Oct 20)

      We are excited to announce the feature packed XGBoost 1.7 release. The release note will walk through some of the major new features first, then make a summary for other improvements and language-binding-specific changes.

      PySpark

      XGBoost 1.7 features initial support for PySpark integration. The new interface is adapted from the existing PySpark XGBoost interface developed by databricks with additional features like QuantileDMatrix and the rapidsai plugin (GPU pipeline) support. The new Spark XGBoost Python estimators not only benefit from PySpark ml facilities for powerful distributed computing but also enjoy the rest of the Python ecosystem. Users can define a custom objective, callbacks, and metrics in Python and use them with this interface on distributed clusters. The support is labeled as experimental with more features to come in future releases. For a brief introduction please visit the tutorial on XGBoost's document page. (#8355, #8344, #8335, #8284, #8271, #8283, #8250, #8231, #8219, #8245, #8217, #8200, #8173, #8172, #8145, #8117, #8131, #8088, #8082, #8085, #8066, #8068, #8067, #8020, #8385)

      Due to its initial support status, the new interface has some limitations; categorical features and multi-output models are not yet supported.

      Development of categorical data support

      More progress on the experimental support for categorical features. In 1.7, XGBoost can handle missing values in categorical features and features a new parameter max_cat_threshold, which limits the number of categories that can be used in the split evaluation. The parameter is enabled when the partitioning algorithm is used and helps prevent over-fitting. Also, the sklearn interface can now accept the feature_types parameter to use data types other than dataframe for categorical features. (#8280, #7821, #8285, #8080, #7948, #7858, #7853, #8212, #7957, #7937, #7934)

      ... (truncated)

      Changelog

      Sourced from xgboost's changelog.

      XGBoost Change Log

      This file records the changes in xgboost library in reverse chronological order.

      v1.7.0 (2022 Oct 20)

      We are excited to announce the feature packed XGBoost 1.7 release. The release note will walk through some of the major new features first, then make a summary for other improvements and language-binding-specific changes.

      PySpark

      XGBoost 1.7 features initial support for PySpark integration. The new interface is adapted from the existing PySpark XGBoost interface developed by databricks with additional features like QuantileDMatrix and the rapidsai plugin (GPU pipeline) support. The new Spark XGBoost Python estimators not only benefit from PySpark ml facilities for powerful distributed computing but also enjoy the rest of the Python ecosystem. Users can define a custom objective, callbacks, and metrics in Python and use them with this interface on distributed clusters. The support is labeled as experimental with more features to come in future releases. For a brief introduction please visit the tutorial on XGBoost's document page. (#8355, #8344, #8335, #8284, #8271, #8283, #8250, #8231, #8219, #8245, #8217, #8200, #8173, #8172, #8145, #8117, #8131, #8088, #8082, #8085, #8066, #8068, #8067, #8020, #8385)

      Due to its initial support status, the new interface has some limitations; categorical features and multi-output models are not yet supported.

      Development of categorical data support

      More progress on the experimental support for categorical features. In 1.7, XGBoost can handle missing values in categorical features and features a new parameter max_cat_threshold, which limits the number of categories that can be used in the split evaluation. The parameter is enabled when the partitioning algorithm is used and helps prevent over-fitting. Also, the sklearn interface can now accept the feature_types parameter to use data types other than dataframe for categorical features. (#8280, #7821, #8285, #8080, #7948, #7858, #7853, #8212, #7957, #7937, #7934)

      Experimental support for federated learning and new communication collective

      An exciting addition to XGBoost is the experimental federated learning support. The federated learning is implemented with a gRPC federated server that aggregates allreduce calls, and federated clients that train on local data and use existing tree methods (approx, hist, gpu_hist). Currently, this only supports horizontal federated learning (samples are split across participants, and each participant has all the features and labels). Future plans include vertical federated learning (features split across participants), and stronger privacy guarantees with homomorphic encryption and differential privacy. See Demo with NVFlare integration for example usage with nvflare.

      As part of the work, XGBoost 1.7 has replaced the old rabit module with the new collective module as the network communication interface with added support for runtime backend selection. In previous versions, the backend is defined at compile time and can not be changed once built. In this new release, users can choose between rabit and federated. (#8029, #8351, #8350, #8342, #8340, #8325, #8279, #8181, #8027, #7958, #7831, #7879, #8257, #8316, #8242, #8057, #8203, #8038, #7965, #7930, #7911)

      The feature is available in the public PyPI binary package for testing.

      Quantile DMatrix

      Before 1.7, XGBoost has an internal data structure called DeviceQuantileDMatrix (and its distributed version). We now extend its support to CPU and renamed it to QuantileDMatrix. This data structure is used for optimizing memory usage for the hist and gpu_hist tree methods. The new feature helps reduce CPU memory usage significantly, especially for dense data. The new QuantileDMatrix can be initialized from both CPU and GPU data, and regardless of where the data comes from, the constructed instance can be used by both the CPU algorithm and GPU algorithm including training and prediction (with some overhead of conversion if the device of data and training algorithm doesn't match). Also, a new parameter ref is added to QuantileDMatrix, which can be used to construct validation/test datasets. Lastly, it's set as default in the scikit-learn interface when a supported tree method is specified by users. (#7889, #7923, #8136, #8215, #8284, #8268, #8220, #8346, #8327, #8130, #8116, #8103, #8094, #8086, #7898, #8060, #8019, #8045, #7901, #7912, #7922)

      Mean absolute error

      The mean absolute error is a new member of the collection of objectives in XGBoost. It's noteworthy since MAE has zero hessian value, which is unusual to XGBoost as XGBoost relies on Newton optimization. Without valid Hessian values, the convergence speed can be slow. As part of the support for MAE, we added line searches into the XGBoost training algorithm to overcome the difficulty of training without valid Hessian values. In the future, we will extend the line search to other objectives where it's appropriate for faster convergence speed. (#8343, #8107, #7812, #8380)

      XGBoost on Browser

      With the help of the pyodide project, you can now run XGBoost on browsers. (#7954, #8369)

      Experimental IPv6 Support for Dask

      With the growing adaption of the new internet protocol, XGBoost joined the club. In the latest release, the Dask interface can be used on IPv6 clusters, see XGBoost's Dask tutorial for details. (#8225, #8234)

      Optimizations

      We have new optimizations for both the hist and gpu_hist tree methods to make XGBoost's training even more efficient.

      • Hist Hist now supports optional by-column histogram build, which is automatically configured based on various conditions of input data. This helps the XGBoost CPU hist algorithm to scale better with different shapes of training datasets. (#8233, #8259). Also, the build histogram kernel now can better utilize CPU registers (#8218)

      • GPU Hist GPU hist performance is significantly improved for wide datasets. GPU hist now supports batched node build, which reduces kernel latency and increases throughput. The improvement is particularly significant when growing deep trees with the default depthwise policy. (#7919, #8073, #8051, #8118, #7867, #7964, #8026)

      Breaking Changes

      ... (truncated)

      Commits

      Dependabot compatibility score

      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


      Dependabot commands and options

      You can trigger Dependabot actions by commenting on this PR:

      • @dependabot rebase will rebase this PR
      • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
      • @dependabot merge will merge this PR after your CI passes on it
      • @dependabot squash and merge will squash and merge this PR after your CI passes on it
      • @dependabot cancel merge will cancel a previously requested merge and block automerging
      • @dependabot reopen will reopen this PR if it is closed
      • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
      dependencies 
      opened by dependabot[bot] 0
    • Bump flake8 from 5.0.4 to 6.0.0

      Bump flake8 from 5.0.4 to 6.0.0

      Bumps flake8 from 5.0.4 to 6.0.0.

      Commits

      Dependabot compatibility score

      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


      Dependabot commands and options

      You can trigger Dependabot actions by commenting on this PR:

      • @dependabot rebase will rebase this PR
      • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
      • @dependabot merge will merge this PR after your CI passes on it
      • @dependabot squash and merge will squash and merge this PR after your CI passes on it
      • @dependabot cancel merge will cancel a previously requested merge and block automerging
      • @dependabot reopen will reopen this PR if it is closed
      • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
      dependencies 
      opened by dependabot[bot] 0
    • Feature Request: support for multioutput regression

      Feature Request: support for multioutput regression

      Nice library thanks!

      Perhaps I missed something, but it looks like multi-output regression is unsupported? If so, is it on the roadmap? Happy to help if needed.

      opened by ageron 0
    Releases(v0.10.0)
    • v0.10.0(Apr 25, 2022)

      • Python 3.6 is no longer supported.
      • Added support for Python 3.9 and 3.10.
      • Trained models can now be transpiled into Rust and Elixir πŸŽ‰
      • Model support:
        • Added support for SGDRegressor from the lightning package.
        • Added support for extremely randomized trees in the LightGBM package.
        • Added support for OneClassSVM from the scikit-learn package.
      • Various improvements to handle the latest versions of the supported models.
      • Various CI/CD improvements including migration from coveralls to codecov, automated generation of the code examples and automated GitHub Release creation.
      • Minor codebase cleanup.
      • Significantly reduced the number of redundant parentheses and return statements in the generated code.
      • Latest Dart language versions are supported.
      • Programming languages can provide native implementation of sigmoid and softmax functions.
      • Improved code generation speed by adding new lines at the end of a generated code.
      Source code(tar.gz)
      Source code(zip)
      m2cgen-0.10.0-py3-none-any.whl(90.07 KB)
      m2cgen-0.10.0.tar.gz(54.48 KB)
    • v0.9.0(Sep 18, 2020)

      • Python 3.5 is no longer supported.
      • Trained models can now be transpiled into F# πŸŽ‰ .
      • Model support:
        • Added support for GLM models from the scikit-learn package.
        • Introduced support for a variety of objectives in LightGBM models.
        • The cauchy function is now supported for GLM models.
      • Improved conversion of floating point numbers into string literals. This leads to improved accuracy of results returned by generated code.
      • Improved handling of missing values in LightGBM models. Kudos to our first time contributor @Aulust πŸŽ‰
      • Various improvements of the code generation runtime.
      Source code(tar.gz)
      Source code(zip)
    • v0.8.0(Jun 18, 2020)

      • This release is the last one which supports Python 3.5. Next release will require Python >= 3.6.
      • Trained models can now be transpiled into Haskell and Ruby πŸŽ‰
      • Various improvements of the code generation runtime:
        • Introduced caching of the interpreter handler names.
        • A string buffer is now used to store generated code.
        • We moved away from using the string.Template.
      • The numpy dependency is no longer required at runtime for the generated Python code.
      • Improved model support:
        • Enabled multiclass support for XGBoost Random Forest models.
        • Added support of Boosted Random Forest models from the XGBoost package.
        • Added support of GLM models from the statsmodels package.
      • Introduced fallback expressions for a variety of functions which rely on simpler language constructs. This should simplify implementation of new interpreters since the number of functions that must be provided by the standard library or by a developer of the given interpreter has been reduced. Note that fallback expressions are optional and can be overridden by a manually written implementation or a corresponding function from the standard library. Among functions for which fallback AST expressions have been introduced are: abs, tanh, sqrt, exp, sigmoid and softmax.

      Kudos to @StrikerRUS who's responsible for all these amazing updates πŸ’ͺ

      Source code(tar.gz)
      Source code(zip)
    • v0.7.0(Apr 7, 2020)

      • Bug fixes:
        • Thresholds for XGBoost trees are forced to be float32 now (https://github.com/BayesWitnesses/m2cgen/issues/168).
        • Fixed support for newer versions of XGBoost, in which the default value for the base_score parameter became None (https://github.com/BayesWitnesses/m2cgen/issues/182).
      • Models can now be transpiled into the Dart language. Kudos to @MattConflitti for this great addition πŸŽ‰
      • Support for following models has been introduced:
        • Models from the statsmodels package are now supported. The list of added models includes: GLS, GLSAR, OLS, ProcessMLE, QuantReg and WLS.
        • Models from the lightning package: AdaGradRegressor/AdaGradClassifier, CDRegressor/CDClassifier, FistaRegressor/FistaClassifier, SAGARegressor/SAGAClassifier, SAGRegressor/SAGClassifier, SDCARegressor/SDCAClassifier, SGDClassifier, LinearSVR/LinearSVC and KernelSVC.
        • RANSACRegressor from the scikit-learn package.
      • The name of the scoring function can now be changed via a parameter. Thanks @mrshu πŸ’ͺ
      • The SubroutineExpr expression has been removed from AST. The logic of how to split the generated code into subroutines is now focused in interpreters and was completely removed from assemblers.
      Source code(tar.gz)
      Source code(zip)
    • v0.6.0(Feb 17, 2020)

      • Trained models can now be transpiled into R, PowerShell and PHP. Major effort delivered solely by @StrikerRUS .
      • In Java interpreter introduced a logic that splits code into methods that is based on heuristics and which doesn't rely on SubroutineExpr from AST.
      • Added support of LightGBM and XGBoost Random Forest models.
      • XGBoost linear models are now supported.
      • LassoLarsCV, Perceptron and PassiveAggressiveClassifier estimators from scikit-learn package are now supported.
      Source code(tar.gz)
      Source code(zip)
    • v0.5.0(Dec 1, 2019)

      Quite a few awesome updates in this release. Many thanks to @StrikerRUS and @chris-smith-zocdoc for making this release happen.

      • Visual Basic and C# joined the list of supported languages. Thanks @StrikerRUS for all the hard work!
      • The numpy dependency is no longer required for generated Python code when no linear algebra is involved. Thanks @StrikerRUS for this update.
      • Fixed the bug when generated Java code exceeded the JVM method size constraints in case when individual estimators of a GBT model contained a large number of leaves. Kudos to @chris-smith-zocdoc for discovering and fixing this issue.
      Source code(tar.gz)
      Source code(zip)
    • v0.4.0(Sep 28, 2019)

    • v0.3.1(Aug 15, 2019)

      • Fixed generation of XGBoost models in case when feature names are not specified in a model object (https://github.com/BayesWitnesses/m2cgen/pull/93). Thanks @akhvorov for contributing the fix.
      Source code(tar.gz)
      Source code(zip)
    • v0.3.0(May 21, 2019)

    • v0.2.1(Apr 17, 2019)

      • For XGBoost models add support of the best_ntree_limit attribute to limit the number of estimators used during prediction. Thanks @arshamg for helping with that.
      Source code(tar.gz)
      Source code(zip)
    • v0.2.0(Mar 22, 2019)

      • Golang joins the family of languages supported by m2cgen πŸŽ‰ Credit goes to @matbur for making such a significant contribution πŸ₯‡
      • For generated C code the custom assign_array function that was used to assign vector values has been replaced with plain memcpy.
      Source code(tar.gz)
      Source code(zip)
    • v0.1.1(Mar 5, 2019)

    • v0.1.0(Feb 12, 2019)

    Owner
    Bayes' Witnesses
    Bayes' Witnesses
    A game theoretic approach to explain the output of any machine learning model.

    SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

    Scott Lundberg 18.2k Jan 02, 2023
    A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

    Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

    AurΓ©lien Geron 1.6k Jan 05, 2023
    Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

    implementation of machine learning Algorithms such as decision tree and random forest and xgboost on darasets then compare results for each and implement ant colony and genetic algorithms on tsp map,

    Mohamadreza Rezaei 1 Jan 19, 2022
    Machine Learning approach for quantifying detector distortion fields

    DistortionML Machine Learning approach for quantifying detector distortion fields. This project is a feasibility study for training a surrogate model

    Joel Bernier 1 Nov 05, 2021
    PySurvival is an open source python package for Survival Analysis modeling

    PySurvival What is Pysurvival ? PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or p

    Square 265 Dec 27, 2022
    Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

    A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

    2.5k Dec 28, 2022
    scikit-learn is a python module for machine learning built on top of numpy / scipy

    About scikit-learn is a python module for machine learning built on top of numpy / scipy. The purpose of the scikit-learn-tutorial subproject is to le

    Gael Varoquaux 122 Dec 12, 2022
    K-means clustering is a method used for clustering analysis, especially in data mining and statistics.

    K Means Algorithm What is K Means This algorithm is an iterative algorithm that partitions the dataset according to their features into K number of pr

    1 Nov 01, 2021
    Practical Time-Series Analysis, published by Packt

    Practical Time-Series Analysis This is the code repository for Practical Time-Series Analysis, published by Packt. It contains all the supporting proj

    Packt 325 Dec 23, 2022
    K-Means clusternig example with Python and Scikit-learn

    Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

    Emin 1 Dec 13, 2021
    Machine Learning Study 혼자 해보기

    Machine Learning Study 혼자 해보기 κΈ°μ—¬μž (Contributors) ✨ Teddy Lee 🏠 HongJaeKwon 🏠 Seungwoo Han 🏠 Tae Heon Kim 🏠 Steve Kwon 🏠 SW Song 🏠 K1A2 🏠 Wooil

    Teddy Lee 1.7k Jan 01, 2023
    A linear regression model for house price prediction

    Linear_Regression_Model A linear regression model for house price prediction. This code is using these packages, so please make sure your have install

    ShawnWang 1 Nov 29, 2021
    Accelerating model creation and evaluation.

    EmeraldML A machine learning library for streamlining the process of (1) cleaning and splitting data, (2) training, optimizing, and testing various mo

    Yusuf 0 Dec 06, 2021
    A machine learning project that predicts the price of used cars in the UK

    Car Price Prediction Image Credit: AA Cars Project Overview Scraped 3000 used cars data from AA Cars website using Python and BeautifulSoup. Cleaned t

    Victor Umunna 7 Oct 13, 2022
    Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational model)

    Sum-Square_Error-Business-Analytical-Tool- Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational m

    om Podey 1 Dec 03, 2021
    Magenta: Music and Art Generation with Machine Intelligence

    Magenta is a research project exploring the role of machine learning in the process of creating art and music. Primarily this involves developing new

    Magenta 18.1k Dec 30, 2022
    pandas, scikit-learn, xgboost and seaborn integration

    pandas, scikit-learn and xgboost integration.

    299 Dec 30, 2022
    ThunderSVM: A Fast SVM Library on GPUs and CPUs

    What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

    Xtra Computing Group 1.4k Dec 22, 2022
    Spark development environment for k8s

    Local Spark Dev Env with Docker Development environment for k8s. Using the spark-operator image to ensure it will be the same environment. Start conta

    Otacilio Filho 18 Jan 04, 2022
    Implementation of K-Nearest Neighbors Algorithm Using PySpark

    KNN With Spark Implementation of KNN using PySpark. The KNN was used on two separate datasets (https://archive.ics.uci.edu/ml/datasets/iris and https:

    Zachary Petroff 4 Dec 30, 2022