[UNMAINTAINED] Automated machine learning for analytics & production

Overview

auto_ml

Automated machine learning for production and analytics

Build Status Documentation Status PyPI version Coverage Status license

Installation

  • pip install auto_ml

Getting started

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset

df_train, df_test = get_boston_dataset()

column_descriptions = {
    'MEDV': 'output',
    'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train)

ml_predictor.score(df_test, df_test.MEDV)

Show off some more features!

auto_ml is designed for production. Here's an example that includes serializing and loading the trained model, then getting predictions on single dictionaries, roughly the process you'd likely follow to deploy the trained model.

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset
from auto_ml.utils_models import load_ml_model

# Load data
df_train, df_test = get_boston_dataset()

# Tell auto_ml which column is 'output'
# Also note columns that aren't purely numerical
# Examples include ['nlp', 'date', 'categorical', 'ignore']
column_descriptions = {
  'MEDV': 'output'
  , 'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train)

# Score the model on test data
test_score = ml_predictor.score(df_test, df_test.MEDV)

# auto_ml is specifically tuned for running in production
# It can get predictions on an individual row (passed in as a dictionary)
# A single prediction like this takes ~1 millisecond
# Here we will demonstrate saving the trained model, and loading it again
file_name = ml_predictor.save()

trained_model = load_ml_model(file_name)

# .predict and .predict_proba take in either:
# A pandas DataFrame
# A list of dictionaries
# A single dictionary (optimized for speed in production evironments)
predictions = trained_model.predict(df_test)
print(predictions)

3rd Party Packages- Deep Learning with TensorFlow & Keras, XGBoost, LightGBM, CatBoost

auto_ml has all of these awesome libraries integrated! Generally, just pass one of them in for model_names. ml_predictor.train(data, model_names=['DeepLearningClassifier'])

Available options are

  • DeepLearningClassifier and DeepLearningRegressor
  • XGBClassifier and XGBRegressor
  • LGBMClassifier and LGBMRegressor
  • CatBoostClassifier and CatBoostRegressor

All of these projects are ready for production. These projects all have prediction time in the 1 millisecond range for a single prediction, and are able to be serialized to disk and loaded into a new environment after training.

Depending on your machine, they can occasionally be difficult to install, so they are not included in auto_ml's default installation. You are responsible for installing them yourself. auto_ml will run fine without them installed (we check what's installed before choosing which algorithm to use).

Feature Responses

Get linear-model-esque interpretations from non-linear models. See the docs for more information and caveats.

Classification

Binary and multiclass classification are both supported. Note that for now, labels must be integers (0 and 1 for binary classification). auto_ml will automatically detect if it is a binary or multiclass classification problem - you just have to pass in ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=column_descriptions)

Feature Learning

Also known as "finally found a way to make this deep learning stuff useful for my business". Deep Learning is great at learning important features from your data. But the way it turns these learned features into a final prediction is relatively basic. Gradient boosting is great at turning features into accurate predictions, but it doesn't do any feature learning.

In auto_ml, you can now automatically use both types of models for what they're great at. If you pass feature_learning=True, fl_data=some_dataframe to .train(), we will do exactly that: train a deep learning model on your fl_data. We won't ask it for predictions (standard stacking approach), instead, we'll use it's penultimate layer to get it's 10 most useful features. Then we'll train a gradient boosted model (or any other model of your choice) on those features plus all the original features.

Across some problems, we've witnessed this lead to a 5% gain in accuracy, while still making predictions in 1-4 milliseconds, depending on model complexity.

ml_predictor.train(df_train, feature_learning=True, fl_data=df_fl_data)

This feature only supports regression and binary classification currently. The rest of auto_ml supports multiclass classification.

Categorical Ensembling

Ever wanted to train one market for every store/customer, but didn't want to maintain hundreds of thousands of independent models? With ml_predictor.train_categorical_ensemble(), we will handle that for you. You'll still have just one consistent API, ml_predictor.predict(data), but behind this single API will be one model for each category you included in your training data.

Just tell us which column holds the category you want to split on, and we'll handle the rest. As always, saving the model, loading it in a different environment, and getting speedy predictions live in production is baked right in.

ml_predictor.train_categorical_ensemble(df_train, categorical_column='store_name')

More details available in the docs

http://auto-ml.readthedocs.io/en/latest/

Advice

Before you go any further, try running the code. Load up some data (either a DataFrame, or a list of dictionaries, where each dictionary is a row of data). Make a column_descriptions dictionary that tells us which attribute name in each row represents the value we're trying to predict. Pass all that into auto_ml, and see what happens!

Everything else in these docs assumes you have done at least the above. Start there and everything else will build on top. But this part gets you the output you're probably interested in, without unnecessary complexity.

Docs

The full docs are available at https://auto_ml.readthedocs.io Again though, I'd strongly recommend running this on an actual dataset before referencing the docs any futher.

What this project does

Automates the whole machine learning process, making it super easy to use for both analytics, and getting real-time predictions in production.

A quick overview of buzzwords, this project automates:

  • Analytics (pass in data, and auto_ml will tell you the relationship of each variable to what it is you're trying to predict).
  • Feature Engineering (particularly around dates, and NLP).
  • Robust Scaling (turning all values into their scaled versions between the range of 0 and 1, in a way that is robust to outliers, and works with sparse data).
  • Feature Selection (picking only the features that actually prove useful).
  • Data formatting (turning a DataFrame or a list of dictionaries into a sparse matrix, one-hot encoding categorical variables, taking the natural log of y for regression problems, etc).
  • Model Selection (which model works best for your problem- we try roughly a dozen apiece for classification and regression problems, including favorites like XGBoost if it's installed on your machine).
  • Hyperparameter Optimization (what hyperparameters work best for that model).
  • Big Data (feed it lots of data- it's fairly efficient with resources).
  • Unicorns (you could conceivably train it to predict what is a unicorn and what is not).
  • Ice Cream (mmm, tasty...).
  • Hugs (this makes it much easier to do your job, hopefully leaving you more time to hug those those you care about).

Running the tests

If you've cloned the source code and are making any changes (highly encouraged!), or just want to make sure everything works in your environment, run nosetests -v tests.

CI is also set up, so if you're developing on this, you can just open a PR, and the tests will run automatically on Travis-CI.

The tests are relatively comprehensive, though as with everything with auto_ml, I happily welcome your contributions here!

Analytics

Comments
  • Comparison with other automatic ML libraries?

    Comparison with other automatic ML libraries?

    First, thank you very much for the hard work and awesome project. I think it will get a lot of use in my workflow.

    I was surveying the landscape of automatic ML solutions, and found your package along with tpot and auto-sklearn. I am trying to figure out what kind of strengths and weaknesses all these packages have. Would you mind discussing what auto_ml does differently and/or better?

    Thanks again.

    opened by sergeyf 12
  • ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape

    ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape

    When I train with DeepLearningRegressor with a 5k dataset everything works fine but when I do it on 50k dataset I get this error.

    Caused by op u'dense_1/random_normal/RandomStandardNormal', defined at:
      File "salary_predict.py", line 38, in <module>
        ml_predictor.train(df_train, model_names=['DeepLearningRegressor'])
      File "/home/ubuntu/deeparted/auto_ml/predictor.py", line 471, in train
        self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y)
      File "/home/ubuntu/deeparted/auto_ml/predictor.py", line 674, in train_ml_estimator
        trained_final_model = self.fit_single_pipeline(X_df, y, estimator_names[0], feature_learning=feature_learning)
      File "/home/ubuntu/deeparted/auto_ml/predictor.py", line 548, in fit_single_pipeline
        ppl.fit(X_df, y)
      File "/home/ubuntu/deeparted/auto_ml/utils_model_training.py", line 88, in fit
        self.model.fit(X_fit, y, callbacks=[early_stopping])
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/wrappers/scikit_learn.py", line 138, in fit
        self.model = self.build_fn(**self.filter_sk_params(self.build_fn))
      File "/home/ubuntu/deeparted/auto_ml/utils_models.py", line 559, in make_deep_learning_model
        model.add(Dense(hidden_layers[0], input_dim=num_cols, kernel_initializer='normal', kernel_regularizer=regularizers.l2(0.01)))
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/models.py", line 433, in add
        layer(x)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/engine/topology.py", line 558, in __call__
        self.build(input_shapes[0])
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/layers/core.py", line 827, in build
        constraint=self.kernel_constraint)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
        return func(*args, **kwargs)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/engine/topology.py", line 391, in add_weight
        weight = K.variable(initializer(shape), dtype=dtype, name=name)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/initializers.py", line 75, in __call__
        dtype=dtype, seed=self.seed)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 3356, in random_normal
        dtype=dtype, seed=seed)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 76, in random_normal
        shape_tensor, dtype, seed=seed1, seed2=seed2)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_random_ops.py", line 220, in _random_standard_normal
        name=name)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
        op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2514, in create_op
        original_op=self._default_original_op, op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
        self._traceback = _extract_stack()
    

    ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[47302,1] [[Node: dense_1/random_normal/RandomStandardNormal = RandomStandardNormalT=DT_INT32, dtype=DT_FLOAT, seed=87654321, seed2=5687716, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    Tensorflow: Version: 1.1.0 Cuda: 8.0 Cudann: 5.1.10

    System Config: Im using P2 (p2.8xlarge) 8 NVIDIA K80 GPUs(192 GB) 64 vCPUs 732 GiB of host memory

    Training: batch_size: 50 Dataset size: 50k No of columns: 4 (1 Output, 2 Categorical, 1 Float)

    Github Issues: https://github.com/tensorflow/tensorflow/issues/4735 https://github.com/tensorflow/tensorflow/issues/1355 and many more on github

    None of this solved the issue. Can anyone help me on this.

    opened by sameerpallav 12
  • User validation on fl_data

    User validation on fl_data

    Do you have an example of using feature learning? I assumed I could just do feature_learning on the training dataset but I get an error like so when running it on the boston dataset:

    ml_predictor.train(df_train, feature_learning=True, fl_data=df_train)

    
    Traceback (most recent call last):
      File "/home/data/.local/lib/python3.6/site-packages/pandas/indexes/base.py", line 2134, in get_loc
        return self._engine.get_loc(key)
      File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
      File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
      File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
      File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
    KeyError: 'MEDV'
    
    opened by calz1 11
  • TypeError: cannot perform reduce with flexible type OR AttributeError: 'Predictor' object has no attribute 'grid_search_pipelines'

    TypeError: cannot perform reduce with flexible type OR AttributeError: 'Predictor' object has no attribute 'grid_search_pipelines'

    Very cool package!

    I am trying out auto_ml with this dataset on SMS spam. I added a header row to the file to give it column names and then do the following:

    import pandas as p  
    import dill  
    from sklearn.model_selection import train_test_split   
    from auto_ml import Predictor 
    
    df = p.read_table('/home/data/auto_ml/sms.txt')
    df_train, df_test = train_test_split(df, test_size=0.5, random_state=42)
    column_descriptions = {
      'spam': 'output'
      , 'text': 'nlp'
    }
    
    ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=column_descriptions)
    ml_predictor.train(df_train)
    

    You can see it sort of works because it is telling me about feature importance but then gives :

    .... nlp_text_txt: 0.0373 nlp_text_free: 0.0441 Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/auto_ml/predictor.py", line 597, in train if len(self.grid_search_pipelines) > 1: AttributeError: 'Predictor' object has no attribute 'grid_search_pipelines'

    Originally I was trying: ml_predictor.train(df_train,ml_for_analytics=True)

    and got:

    test_score = ml_predictor.score(df_test, df_test.spam) Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/auto_ml/predictor.py", line 1014, in score score, probas = self._scorer.score(self.trained_pipeline, X_test, y_test, advanced_scoring=advanced_scoring) File "/usr/local/lib/python2.7/dist-packages/auto_ml/utils_scoring.py", line 268, in score score = self.scoring_func(y, predictions) File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 1884, in brier_score_loss pos_label = y_true.max() File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 26, in _amax return umr_maximum(a, axis, None, out, keepdims) TypeError: cannot perform reduce with flexible type

    opened by calz1 11
  • error during LGBM predict_proba

    error during LGBM predict_proba

    Hi all..

    After long hours of training my model with lightgbm, I just run predict_proba and at first I ran into data_rate_limit in Jupyiter.. then I changed that limit and had to train the model again.. but this time I ran into another error:

    Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'

    can someone help me please? thanks

    opened by vkocaman 10
  • AttributeError: 'XGBRegressor' object has no attribute 'get_fscore'

    AttributeError: 'XGBRegressor' object has no attribute 'get_fscore'

    Testing out auto_ml with XGBoost and ran into this issue. This is against a fresh clone of the XGBoost repository so it looks like their API changed.

    predictor.train(x_train, verbose=True, model_names=['XGBRegressor'])

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in _get_xgb_feat_importances(self, clf)
        890             # xgb.XGBClassifier.fit() or xgb.XGBRegressor().fit()
    --> 891             fscore = clf.booster().get_fscore()
        892         except:
    
    TypeError: 'str' object is not callable
    
    During handling of the above exception, another exception occurred:
    
    AttributeError                            Traceback (most recent call last)
    <ipython-input-37-eafdc24b187b> in <module>()
    ----> 1 predictor.train(x_train, verbose=True, model_names=['XGBRegressor'])
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in train(self, raw_training_data, user_input_func, optimize_final_model, write_gs_param_results_to_file, perform_feature_selection, verbose, X_test, y_test, ml_for_analytics, take_log_of_y, model_names, perform_feature_scaling, calibrate_final_model, _scorer, scoring, verify_features, training_params, grid_search_params, compare_all_models, cv, feature_learning, fl_data)
        469 
        470         # This is our main logic for how we train the final model
    --> 471         self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y)
        472 
        473         # Calibrate the probability predictions from our final model
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in train_ml_estimator(self, estimator_names, scoring, X_df, y, feature_learning)
        672         # Use Case 1: Super straightforward: just train a single, non-optimized model
        673         if len(estimator_names) == 1 and self.optimize_final_model != True:
    --> 674             trained_final_model = self.fit_single_pipeline(X_df, y, estimator_names[0], feature_learning=feature_learning)
        675 
        676         # Use Case 2: Compare a bunch of models, but don't optimize any of them
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in fit_single_pipeline(self, X_df, y, model_name, feature_learning)
        554 
        555         self.trained_final_model = ppl
    --> 556         self.print_results(model_name)
        557 
        558         return ppl
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in print_results(self, model_name)
        578 
        579         elif self.ml_for_analytics and model_name in ['RandomForestClassifier', 'RandomForestRegressor', 'XGBClassifier', 'XGBRegressor', 'GradientBoostingRegressor', 'GradientBoostingClassifier', 'LGBMRegressor', 'LGBMClassifier']:
    --> 580             self._print_ml_analytics_results_random_forest()
        581 
        582 
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in _print_ml_analytics_results_random_forest(self)
        938         # XGB's Classifier has a proper .feature_importances_ property, while the XGBRegressor does not.
        939         if final_model_obj.model_name in ['XGBRegressor', 'XGBClassifier']:
    --> 940             self._get_xgb_feat_importances(final_model_obj.model)
        941 
        942         else:
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in _get_xgb_feat_importances(self, clf)
        893             # Handles case when clf has been created by calling xgb.train.
        894             # Thus, clf is an instance of xgb.Booster.
    --> 895             fscore = clf.get_fscore()
        896 
        897         trained_feature_names = self._get_trained_feature_names()
    
    AttributeError: 'XGBRegressor' object has no attribute 'get_fscore'
    
    opened by volker48 9
  • Error on install - Windows 10

    Error on install - Windows 10

    I have progressed through the install. although I got stuck with not having visual C++ 14 installed. I now get the following error at the end of the install. can you please help. What more info do you need.

    Command "c:\users\username\appdata\local\programs\python\python35-32\python.exe -u -c "import setuptools, tokenize;file='C:\Users\username\AppData\Local\Temp\pip-build-j_5l4z6_\scipy\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\username\AppData\Local\Temp\pip-5r95bpz0-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\username\AppData\Local\Temp\pip-build-j_5l4z6_\scipy\

    opened by bitsam 9
  • 'FinalModelATC' object has no attribute 'feature_ranges'

    'FinalModelATC' object has no attribute 'feature_ranges'

    I'm trying to run your "Getting Started" example on the numerai training data and getting the following error:

    AttributeError                            Traceback (most recent call last)
    <ipython-input-39-aab5c9ba7e0f> in <module>()
          6 # Can pass in type_of_estimator='regressor' as well
          7 
    ----> 8 ml_predictor.train(df_dict)
          9 # Wait for the machine to learn all the complex and beautiful patterns in your data...
         10 
    
    /Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in train(***failed resolving arguments***)
        553 
        554 
    --> 555         self.perform_grid_search_by_model_names(estimator_names, scoring, X_df, y)
        556 
        557         # If we ran GridSearchCV, we will have to pick the best model
    
    /Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in perform_grid_search_by_model_names(self, estimator_names, scoring, X_df, y)
        671 
        672             if self.ml_for_analytics and model_name in ('LogisticRegression', 'RidgeClassifier', 'LinearRegression', 'Ridge'):
    --> 673                 self._print_ml_analytics_results_regression()
        674             elif self.ml_for_analytics and model_name in ['RandomForestClassifier', 'RandomForestRegressor', 'XGBClassifier', 'XGBRegressor', 'GradientBoostingRegressor', 'GradientBoostingClassifier']:
        675                 self._print_ml_analytics_results_random_forest()
    
    /Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in _print_ml_analytics_results_regression(self)
        770             trained_coefficients = self.trained_pipeline.named_steps['final_model'].model.coef_
        771 
    --> 772         feature_ranges = self.trained_pipeline.named_steps['final_model'].feature_ranges
        773 
        774         # TODO(PRESTON): readability. Can probably do this in a single zip statement.
    
    AttributeError: 'FinalModelATC' object has no attribute 'feature_ranges'
    

    Are you familiar with this type of issue?

    opened by akodate 9
  • far future: take in dataframes or other sparse data structures directly

    far future: take in dataframes or other sparse data structures directly

    right now taking in python dictionaries is awesome for it's flexibility and ease of development, but is killing us on memory, even if it is a super sparse data structure.

    one workaround we could do for this is described in https://github.com/ClimbsRocks/auto_ml/issues/40, though that feels fairly hacky. taking in a DataFrame seems much more obvious.

    opened by ClimbsRocks 9
  • Fix XGBoost error

    Fix XGBoost error

    It appears that the current XGBoost package that is installed with pip does not have the feature_importance_ attribute. Therefore if you install the xgboost package using pip install xgboost you will be unable to conduct feature extraction from the XGBClassifier or the XGBRegressor object.

    I made a workaround after trying to check for feature_importance_ because if the newest version of XGBoost is installed from source then feature_importance_ works fine so it will likely exist in future versions. But currently the version available by pip install xgboost does not provide the attribute.

    opened by a-holm 7
  • Got an unexpected keyword argument 'max_iter' in SGDClassifier

    Got an unexpected keyword argument 'max_iter' in SGDClassifier

    Fail to run the example in README.

    from auto_ml import Predictor
    from auto_ml.utils import get_boston_dataset
    
    df_train, df_test = get_boston_dataset()
    
    column_descriptions = {
        'MEDV': 'output'
        , 'CHAS': 'categorical'
    }
    
    ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
    
    ml_predictor.train(df_train)
    
    ml_predictor.score(df_test, df_test.MEDV)
    

    And here is the error message.

    ➜ python ./automl_demo.py
    Using TensorFlow backend.
    Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.
    
    If you have any issues, or new feature ideas, let us know at https://github.com/ClimbsRocks/auto_ml
    Now using the model training_params that you passed in:
    {}
    After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
    {'presort': False, 'warm_start': True, 'learning_rate': 0.1}
    Traceback (most recent call last):
      File "./automl_demo.py", line 13, in <module>
        ml_predictor.train(df_train)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/predictor.py", line 611, in train
        X_df = self.fit_transformation_pipeline(X_df, y, estimator_names)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/predictor.py", line 834, in fit_transformation_pipeline
        ppl = self._construct_pipeline(model_name=model_names[0], keep_cat_features=self.keep_cat_features)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/predictor.py", line 206, in _construct_pipeline
        final_model = utils_models.get_model_from_name(model_name, training_params=params)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/utils_models.py", line 129, in get_model_from_name
        'SGDClassifier': SGDClassifier(max_iter=1000, tol=0.001),
    TypeError: __init__() got an unexpected keyword argument 'max_iter'
    
    opened by tobegit3hub 7
  • get bad score running the sample code

    get bad score running the sample code

    1. I configure everything and run the whole script and get negative score on the boston datasets. Is it just a sample since i get a bad score is normal ?

    2. The default is only using gradient boosting for the classification and regression and not automatically choose the best model for taining and prediction?

    opened by Aun0124 0
  • pip install automl gets stuck after installing multiprocess-0.70.7

    pip install automl gets stuck after installing multiprocess-0.70.7

    The following is the last snippet in the pip install logs before the installation gets stuck indefinitely:

    Collecting multiprocess>=0.70.7 Using cached multiprocess-0.70.11-py3-none-any.whl (98 kB) Using cached multiprocess-0.70.10.zip (2.4 MB) Using cached multiprocess-0.70.9.tar.gz (1.6 MB) Using cached multiprocess-0.70.8.tar.gz (1.6 MB) Using cached multiprocess-0.70.7.tar.gz (1.4 MB)

    Even without using the cached copies, the installation gets stuck at this point.

    Update: One possible reason for this error could be that \sklearn_deap2-0.2.2-py3.8\evolutionary_search\cv.py incorrectly tries to import check_scoring in the following manner:

    from sklearn.metrics.scorer import check_scoring

    instead of this:

    from sklearn.metrics import check_scoring

    opened by akshatpv 2
  • docs: fix simple typo, puncutation -> punctuation

    docs: fix simple typo, puncutation -> punctuation

    There is a small typo in docs/source/formatting_data.rst.

    Should read punctuation rather than puncutation.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 1
  • Update DataFrameVectorizer.py

    Update DataFrameVectorizer.py

    DeprecationWarning: The module is deprecated in version 0.21 and removed in version 0.23. This module was removed in the latest scikit-learn version. please remove this module.

    opened by karthikreddykuna 1
Releases(v2.7.0)
  • v2.7.0(Sep 12, 2017)

    Ensembling's back for it's alpha release, evolutionary algorithms are doing our hyperparameter search now, we've handled a bunch of dependency updates, and a bunch of smaller performance tweaks.

    Source code(tar.gz)
    Source code(zip)
  • v2.4.0(Jul 14, 2017)

    Using quantile regression, we can now return prediction intervals.

    Another minor change is adding in a column of absolute changes for feature_responses

    Source code(tar.gz)
    Source code(zip)
  • v2.3.5(Jul 9, 2017)

  • v2.2.1(Jun 13, 2017)

    Avoids double training deep learning models, changes how we sort and order features for analytics reporting, and adds a new _all_small_categories category to categorical ensembling.

    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Jun 6, 2017)

  • 2.1.5(May 18, 2017)

  • 2.1.2(May 3, 2017)

  • 2.1(Apr 19, 2017)

    Feature learning and categorical ensembling are really cool features that each get us 2-5% accuracy gains!

    For full info, check the docs.

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Apr 4, 2017)

    Enough incremental improvements have added up that we're now ready to mark a 2.0 release!

    Part of the progress also means deprecating a few unused features that were adding unnecessary complexity and preventing us from implementing new features like ensembling properly.

    New changes for the 2.0 release:

    • Refactored and cleaned up code. Ensembling should now be much easier to add in, and in a way that's fast enough to be used in production (getting predictions from 10 models should take less than 10x as long as getting predictions from 1 model)
    • Deprecated compute_power
    • Deprecated several methods for grid searching over transformation_pipeline hyperparameters (different methods for feature selection, whether or not to do feature scaling, etc.). We just directly made a decision to prioritize the final model hyperparameter search.
    • Deprecated the current implementation of ensembling. It was implemented in such a way that it was not quick enough to make predictions in prod, and thus, did not meet the primary use cases of this project. Part of removing it allows us to reimplement ensembling in a way that is prod-ready.
    • Deprecated X_test and y_test, except for working with calibrate_final_model.
    • Added better documentation on features that were in silent alpha release previously.
    • Improved test coverage!

    Major changes since the 1.0 release:

    • Integrations for deep learning (using TensorFlow and Keras)
    • Integration of Microsoft's LightGBM, which appears to be a possibly better version of XGBoost
    • Quite a bit more user logging, warning, and input validation/input cleaning
    • Quite a few edge case bug fixes and minor performance improvements
    • Fully automated test suite with decent test coverage!
    • Better documentation
    • Support for pandas DataFrames- much more space efficient than lists of dictionaries
    Source code(tar.gz)
    Source code(zip)
    auto_ml-2.0.0-py2.py3-none-any.whl(47.43 KB)
    auto_ml-2.0.0.tar.gz(41.64 KB)
  • v1.12.2(Mar 16, 2017)

    This will be our final release before v2.

    Includes many recent changes- Deep Learning with Keras/TensorFlow, more efficient hyperparameter optimization, Microsoft's LightGBM, more advanced logging for scoring, and quite a few minor usability improvements (like improved logging when input is not as expected).

    Source code(tar.gz)
    Source code(zip)
  • v1.3(Oct 11, 2016)

Owner
Preston Parry
Rock Climber, Biker, Community Builder, Teacher, data scientist & machine learning geek
Preston Parry
DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency

[CVPR19] DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency (Oral paper) Authors: Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang PDF:

Kuang-Jui Hsu 139 Dec 22, 2022
Monk is a low code Deep Learning tool and a unified wrapper for Computer Vision.

Monk - A computer vision toolkit for everyone Why use Monk Issue: Want to begin learning computer vision Solution: Start with Monk's hands-on study ro

Tessellate Imaging 507 Dec 04, 2022
TreeSubstitutionCipher - Encryption system based on trees and substitution

Tree Substitution Cipher Generation Algorithm: Generate random tree. Tree nodes

stepa 1 Jan 08, 2022
Backdoor Attack through Frequency Domain

Backdoor Attack through Frequency Domain DEPENDENCIES python==3.8.3 numpy==1.19.4 tensorflow==2.4.0 opencv==4.5.1 idx2numpy==1.2.3 pytorch==1.7.0 Data

5 Jun 18, 2022
Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh

generate_cloud_points Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh. Run python disp_mesh.py Or you

Peng Yu 2 Dec 24, 2021
Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN If you use this code for your research, please cite ou

41 Dec 08, 2022
PyTorch Implementation of PIXOR: Real-time 3D Object Detection from Point Clouds

PIXOR: Real-time 3D Object Detection from Point Clouds This is a custom implementation of the paper from Uber ATG using PyTorch 1.0. It represents the

Philip Huang 270 Dec 14, 2022
A community run, 5-day PyTorch Deep Learning Bootcamp

Deep Learning Winter School, November 2107. Tel Aviv Deep Learning Bootcamp : http://deep-ml.com. About Tel-Aviv Deep Learning Bootcamp is an intensiv

Shlomo Kashani. 1.3k Sep 04, 2021
pytorch, hand(object) detect ,yolo v5,手检测

YOLO V5 物体检测,包括手部检测。 项目介绍 手部检测 手部检测示例如下 : 视频示例: 项目配置 作者开发环境: Python 3.7 PyTorch = 1.5.1 数据集 手部检测数据集 该项目数据集采用 TV-Hand 和 COCO-Hand (COCO-Hand-Big 部分) 进

Eric.Lee 11 Dec 20, 2022
Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

stroke-predictions-ml-model machine learning model to predict individuals chance

Alex Volchek 1 Jan 03, 2022
Machine Translation Implement By Bi-GRU And Transformer

Seq2Seq Translation Implement By Bidirectional GRU And Transformer In Pytorch Before You Run The Code You should download the data through the link be

He Wang 2 Oct 27, 2021
Implementation of PersonaGPT Dialog Model

PersonaGPT An open-domain conversational agent with many personalities PersonaGPT is an open-domain conversational agent cpable of decoding personaliz

ILLIDAN Lab 42 Jan 01, 2023
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Introduction This is a Python package available on PyPI for NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pyto

Artit 'Art' Wangperawong 5 Sep 29, 2021
Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

PLOP: Learning without Forgetting for Continual Semantic Segmentation This repository contains all of our code. It is a modified version of Cermelli e

Arthur Douillard 116 Dec 14, 2022
Code accompanying our paper Feature Learning in Infinite-Width Neural Networks

Empirical Experiments in "Feature Learning in Infinite-width Neural Networks" This repo contains code to replicate our experiments (Word2Vec, MAML) in

Edward Hu 37 Dec 14, 2022
Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

MOSNet pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion" https://arxiv.org/abs/1904.08352 Dependency L

9 Nov 18, 2022
Official repository for Few-shot Image Generation via Cross-domain Correspondence (CVPR '21)

Few-shot Image Generation via Cross-domain Correspondence Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zh

Utkarsh Ojha 251 Dec 11, 2022
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

RaftMLP RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality? By Yuki Tatsunami and Masato Taki (Rikkyo University) [arxiv]

Okojo 20 Aug 31, 2022
Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness This repository contains the code used for the exper

H.R. Oosterhuis 28 Nov 29, 2022
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

Multimedia Research 196 Dec 13, 2022