[UNMAINTAINED] Automated machine learning for analytics & production

Overview

auto_ml

Automated machine learning for production and analytics

Build Status Documentation Status PyPI version Coverage Status license

Installation

  • pip install auto_ml

Getting started

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset

df_train, df_test = get_boston_dataset()

column_descriptions = {
    'MEDV': 'output',
    'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train)

ml_predictor.score(df_test, df_test.MEDV)

Show off some more features!

auto_ml is designed for production. Here's an example that includes serializing and loading the trained model, then getting predictions on single dictionaries, roughly the process you'd likely follow to deploy the trained model.

from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset
from auto_ml.utils_models import load_ml_model

# Load data
df_train, df_test = get_boston_dataset()

# Tell auto_ml which column is 'output'
# Also note columns that aren't purely numerical
# Examples include ['nlp', 'date', 'categorical', 'ignore']
column_descriptions = {
  'MEDV': 'output'
  , 'CHAS': 'categorical'
}

ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)

ml_predictor.train(df_train)

# Score the model on test data
test_score = ml_predictor.score(df_test, df_test.MEDV)

# auto_ml is specifically tuned for running in production
# It can get predictions on an individual row (passed in as a dictionary)
# A single prediction like this takes ~1 millisecond
# Here we will demonstrate saving the trained model, and loading it again
file_name = ml_predictor.save()

trained_model = load_ml_model(file_name)

# .predict and .predict_proba take in either:
# A pandas DataFrame
# A list of dictionaries
# A single dictionary (optimized for speed in production evironments)
predictions = trained_model.predict(df_test)
print(predictions)

3rd Party Packages- Deep Learning with TensorFlow & Keras, XGBoost, LightGBM, CatBoost

auto_ml has all of these awesome libraries integrated! Generally, just pass one of them in for model_names. ml_predictor.train(data, model_names=['DeepLearningClassifier'])

Available options are

  • DeepLearningClassifier and DeepLearningRegressor
  • XGBClassifier and XGBRegressor
  • LGBMClassifier and LGBMRegressor
  • CatBoostClassifier and CatBoostRegressor

All of these projects are ready for production. These projects all have prediction time in the 1 millisecond range for a single prediction, and are able to be serialized to disk and loaded into a new environment after training.

Depending on your machine, they can occasionally be difficult to install, so they are not included in auto_ml's default installation. You are responsible for installing them yourself. auto_ml will run fine without them installed (we check what's installed before choosing which algorithm to use).

Feature Responses

Get linear-model-esque interpretations from non-linear models. See the docs for more information and caveats.

Classification

Binary and multiclass classification are both supported. Note that for now, labels must be integers (0 and 1 for binary classification). auto_ml will automatically detect if it is a binary or multiclass classification problem - you just have to pass in ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=column_descriptions)

Feature Learning

Also known as "finally found a way to make this deep learning stuff useful for my business". Deep Learning is great at learning important features from your data. But the way it turns these learned features into a final prediction is relatively basic. Gradient boosting is great at turning features into accurate predictions, but it doesn't do any feature learning.

In auto_ml, you can now automatically use both types of models for what they're great at. If you pass feature_learning=True, fl_data=some_dataframe to .train(), we will do exactly that: train a deep learning model on your fl_data. We won't ask it for predictions (standard stacking approach), instead, we'll use it's penultimate layer to get it's 10 most useful features. Then we'll train a gradient boosted model (or any other model of your choice) on those features plus all the original features.

Across some problems, we've witnessed this lead to a 5% gain in accuracy, while still making predictions in 1-4 milliseconds, depending on model complexity.

ml_predictor.train(df_train, feature_learning=True, fl_data=df_fl_data)

This feature only supports regression and binary classification currently. The rest of auto_ml supports multiclass classification.

Categorical Ensembling

Ever wanted to train one market for every store/customer, but didn't want to maintain hundreds of thousands of independent models? With ml_predictor.train_categorical_ensemble(), we will handle that for you. You'll still have just one consistent API, ml_predictor.predict(data), but behind this single API will be one model for each category you included in your training data.

Just tell us which column holds the category you want to split on, and we'll handle the rest. As always, saving the model, loading it in a different environment, and getting speedy predictions live in production is baked right in.

ml_predictor.train_categorical_ensemble(df_train, categorical_column='store_name')

More details available in the docs

http://auto-ml.readthedocs.io/en/latest/

Advice

Before you go any further, try running the code. Load up some data (either a DataFrame, or a list of dictionaries, where each dictionary is a row of data). Make a column_descriptions dictionary that tells us which attribute name in each row represents the value we're trying to predict. Pass all that into auto_ml, and see what happens!

Everything else in these docs assumes you have done at least the above. Start there and everything else will build on top. But this part gets you the output you're probably interested in, without unnecessary complexity.

Docs

The full docs are available at https://auto_ml.readthedocs.io Again though, I'd strongly recommend running this on an actual dataset before referencing the docs any futher.

What this project does

Automates the whole machine learning process, making it super easy to use for both analytics, and getting real-time predictions in production.

A quick overview of buzzwords, this project automates:

  • Analytics (pass in data, and auto_ml will tell you the relationship of each variable to what it is you're trying to predict).
  • Feature Engineering (particularly around dates, and NLP).
  • Robust Scaling (turning all values into their scaled versions between the range of 0 and 1, in a way that is robust to outliers, and works with sparse data).
  • Feature Selection (picking only the features that actually prove useful).
  • Data formatting (turning a DataFrame or a list of dictionaries into a sparse matrix, one-hot encoding categorical variables, taking the natural log of y for regression problems, etc).
  • Model Selection (which model works best for your problem- we try roughly a dozen apiece for classification and regression problems, including favorites like XGBoost if it's installed on your machine).
  • Hyperparameter Optimization (what hyperparameters work best for that model).
  • Big Data (feed it lots of data- it's fairly efficient with resources).
  • Unicorns (you could conceivably train it to predict what is a unicorn and what is not).
  • Ice Cream (mmm, tasty...).
  • Hugs (this makes it much easier to do your job, hopefully leaving you more time to hug those those you care about).

Running the tests

If you've cloned the source code and are making any changes (highly encouraged!), or just want to make sure everything works in your environment, run nosetests -v tests.

CI is also set up, so if you're developing on this, you can just open a PR, and the tests will run automatically on Travis-CI.

The tests are relatively comprehensive, though as with everything with auto_ml, I happily welcome your contributions here!

Analytics

Comments
  • Comparison with other automatic ML libraries?

    Comparison with other automatic ML libraries?

    First, thank you very much for the hard work and awesome project. I think it will get a lot of use in my workflow.

    I was surveying the landscape of automatic ML solutions, and found your package along with tpot and auto-sklearn. I am trying to figure out what kind of strengths and weaknesses all these packages have. Would you mind discussing what auto_ml does differently and/or better?

    Thanks again.

    opened by sergeyf 12
  • ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape

    ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape

    When I train with DeepLearningRegressor with a 5k dataset everything works fine but when I do it on 50k dataset I get this error.

    Caused by op u'dense_1/random_normal/RandomStandardNormal', defined at:
      File "salary_predict.py", line 38, in <module>
        ml_predictor.train(df_train, model_names=['DeepLearningRegressor'])
      File "/home/ubuntu/deeparted/auto_ml/predictor.py", line 471, in train
        self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y)
      File "/home/ubuntu/deeparted/auto_ml/predictor.py", line 674, in train_ml_estimator
        trained_final_model = self.fit_single_pipeline(X_df, y, estimator_names[0], feature_learning=feature_learning)
      File "/home/ubuntu/deeparted/auto_ml/predictor.py", line 548, in fit_single_pipeline
        ppl.fit(X_df, y)
      File "/home/ubuntu/deeparted/auto_ml/utils_model_training.py", line 88, in fit
        self.model.fit(X_fit, y, callbacks=[early_stopping])
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/wrappers/scikit_learn.py", line 138, in fit
        self.model = self.build_fn(**self.filter_sk_params(self.build_fn))
      File "/home/ubuntu/deeparted/auto_ml/utils_models.py", line 559, in make_deep_learning_model
        model.add(Dense(hidden_layers[0], input_dim=num_cols, kernel_initializer='normal', kernel_regularizer=regularizers.l2(0.01)))
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/models.py", line 433, in add
        layer(x)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/engine/topology.py", line 558, in __call__
        self.build(input_shapes[0])
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/layers/core.py", line 827, in build
        constraint=self.kernel_constraint)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
        return func(*args, **kwargs)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/engine/topology.py", line 391, in add_weight
        weight = K.variable(initializer(shape), dtype=dtype, name=name)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/initializers.py", line 75, in __call__
        dtype=dtype, seed=self.seed)
      File "/home/ubuntu/deeparted/tensorflow/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 3356, in random_normal
        dtype=dtype, seed=seed)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 76, in random_normal
        shape_tensor, dtype, seed=seed1, seed2=seed2)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_random_ops.py", line 220, in _random_standard_normal
        name=name)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
        op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2514, in create_op
        original_op=self._default_original_op, op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
        self._traceback = _extract_stack()
    

    ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[47302,1] [[Node: dense_1/random_normal/RandomStandardNormal = RandomStandardNormalT=DT_INT32, dtype=DT_FLOAT, seed=87654321, seed2=5687716, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    Tensorflow: Version: 1.1.0 Cuda: 8.0 Cudann: 5.1.10

    System Config: Im using P2 (p2.8xlarge) 8 NVIDIA K80 GPUs(192 GB) 64 vCPUs 732 GiB of host memory

    Training: batch_size: 50 Dataset size: 50k No of columns: 4 (1 Output, 2 Categorical, 1 Float)

    Github Issues: https://github.com/tensorflow/tensorflow/issues/4735 https://github.com/tensorflow/tensorflow/issues/1355 and many more on github

    None of this solved the issue. Can anyone help me on this.

    opened by sameerpallav 12
  • User validation on fl_data

    User validation on fl_data

    Do you have an example of using feature learning? I assumed I could just do feature_learning on the training dataset but I get an error like so when running it on the boston dataset:

    ml_predictor.train(df_train, feature_learning=True, fl_data=df_train)

    
    Traceback (most recent call last):
      File "/home/data/.local/lib/python3.6/site-packages/pandas/indexes/base.py", line 2134, in get_loc
        return self._engine.get_loc(key)
      File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
      File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
      File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
      File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
    KeyError: 'MEDV'
    
    opened by calz1 11
  • TypeError: cannot perform reduce with flexible type OR AttributeError: 'Predictor' object has no attribute 'grid_search_pipelines'

    TypeError: cannot perform reduce with flexible type OR AttributeError: 'Predictor' object has no attribute 'grid_search_pipelines'

    Very cool package!

    I am trying out auto_ml with this dataset on SMS spam. I added a header row to the file to give it column names and then do the following:

    import pandas as p  
    import dill  
    from sklearn.model_selection import train_test_split   
    from auto_ml import Predictor 
    
    df = p.read_table('/home/data/auto_ml/sms.txt')
    df_train, df_test = train_test_split(df, test_size=0.5, random_state=42)
    column_descriptions = {
      'spam': 'output'
      , 'text': 'nlp'
    }
    
    ml_predictor = Predictor(type_of_estimator='classifier', column_descriptions=column_descriptions)
    ml_predictor.train(df_train)
    

    You can see it sort of works because it is telling me about feature importance but then gives :

    .... nlp_text_txt: 0.0373 nlp_text_free: 0.0441 Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/auto_ml/predictor.py", line 597, in train if len(self.grid_search_pipelines) > 1: AttributeError: 'Predictor' object has no attribute 'grid_search_pipelines'

    Originally I was trying: ml_predictor.train(df_train,ml_for_analytics=True)

    and got:

    test_score = ml_predictor.score(df_test, df_test.spam) Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/auto_ml/predictor.py", line 1014, in score score, probas = self._scorer.score(self.trained_pipeline, X_test, y_test, advanced_scoring=advanced_scoring) File "/usr/local/lib/python2.7/dist-packages/auto_ml/utils_scoring.py", line 268, in score score = self.scoring_func(y, predictions) File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 1884, in brier_score_loss pos_label = y_true.max() File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 26, in _amax return umr_maximum(a, axis, None, out, keepdims) TypeError: cannot perform reduce with flexible type

    opened by calz1 11
  • error during LGBM predict_proba

    error during LGBM predict_proba

    Hi all..

    After long hours of training my model with lightgbm, I just run predict_proba and at first I ran into data_rate_limit in Jupyiter.. then I changed that limit and had to train the model again.. but this time I ran into another error:

    Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'

    can someone help me please? thanks

    opened by vkocaman 10
  • AttributeError: 'XGBRegressor' object has no attribute 'get_fscore'

    AttributeError: 'XGBRegressor' object has no attribute 'get_fscore'

    Testing out auto_ml with XGBoost and ran into this issue. This is against a fresh clone of the XGBoost repository so it looks like their API changed.

    predictor.train(x_train, verbose=True, model_names=['XGBRegressor'])

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in _get_xgb_feat_importances(self, clf)
        890             # xgb.XGBClassifier.fit() or xgb.XGBRegressor().fit()
    --> 891             fscore = clf.booster().get_fscore()
        892         except:
    
    TypeError: 'str' object is not callable
    
    During handling of the above exception, another exception occurred:
    
    AttributeError                            Traceback (most recent call last)
    <ipython-input-37-eafdc24b187b> in <module>()
    ----> 1 predictor.train(x_train, verbose=True, model_names=['XGBRegressor'])
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in train(self, raw_training_data, user_input_func, optimize_final_model, write_gs_param_results_to_file, perform_feature_selection, verbose, X_test, y_test, ml_for_analytics, take_log_of_y, model_names, perform_feature_scaling, calibrate_final_model, _scorer, scoring, verify_features, training_params, grid_search_params, compare_all_models, cv, feature_learning, fl_data)
        469 
        470         # This is our main logic for how we train the final model
    --> 471         self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y)
        472 
        473         # Calibrate the probability predictions from our final model
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in train_ml_estimator(self, estimator_names, scoring, X_df, y, feature_learning)
        672         # Use Case 1: Super straightforward: just train a single, non-optimized model
        673         if len(estimator_names) == 1 and self.optimize_final_model != True:
    --> 674             trained_final_model = self.fit_single_pipeline(X_df, y, estimator_names[0], feature_learning=feature_learning)
        675 
        676         # Use Case 2: Compare a bunch of models, but don't optimize any of them
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in fit_single_pipeline(self, X_df, y, model_name, feature_learning)
        554 
        555         self.trained_final_model = ppl
    --> 556         self.print_results(model_name)
        557 
        558         return ppl
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in print_results(self, model_name)
        578 
        579         elif self.ml_for_analytics and model_name in ['RandomForestClassifier', 'RandomForestRegressor', 'XGBClassifier', 'XGBRegressor', 'GradientBoostingRegressor', 'GradientBoostingClassifier', 'LGBMRegressor', 'LGBMClassifier']:
    --> 580             self._print_ml_analytics_results_random_forest()
        581 
        582 
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in _print_ml_analytics_results_random_forest(self)
        938         # XGB's Classifier has a proper .feature_importances_ property, while the XGBRegressor does not.
        939         if final_model_obj.model_name in ['XGBRegressor', 'XGBClassifier']:
    --> 940             self._get_xgb_feat_importances(final_model_obj.model)
        941 
        942         else:
    
    /home/ubuntu/venv/lib/python3.5/site-packages/auto_ml/predictor.py in _get_xgb_feat_importances(self, clf)
        893             # Handles case when clf has been created by calling xgb.train.
        894             # Thus, clf is an instance of xgb.Booster.
    --> 895             fscore = clf.get_fscore()
        896 
        897         trained_feature_names = self._get_trained_feature_names()
    
    AttributeError: 'XGBRegressor' object has no attribute 'get_fscore'
    
    opened by volker48 9
  • Error on install - Windows 10

    Error on install - Windows 10

    I have progressed through the install. although I got stuck with not having visual C++ 14 installed. I now get the following error at the end of the install. can you please help. What more info do you need.

    Command "c:\users\username\appdata\local\programs\python\python35-32\python.exe -u -c "import setuptools, tokenize;file='C:\Users\username\AppData\Local\Temp\pip-build-j_5l4z6_\scipy\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\username\AppData\Local\Temp\pip-5r95bpz0-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\username\AppData\Local\Temp\pip-build-j_5l4z6_\scipy\

    opened by bitsam 9
  • 'FinalModelATC' object has no attribute 'feature_ranges'

    'FinalModelATC' object has no attribute 'feature_ranges'

    I'm trying to run your "Getting Started" example on the numerai training data and getting the following error:

    AttributeError                            Traceback (most recent call last)
    <ipython-input-39-aab5c9ba7e0f> in <module>()
          6 # Can pass in type_of_estimator='regressor' as well
          7 
    ----> 8 ml_predictor.train(df_dict)
          9 # Wait for the machine to learn all the complex and beautiful patterns in your data...
         10 
    
    /Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in train(***failed resolving arguments***)
        553 
        554 
    --> 555         self.perform_grid_search_by_model_names(estimator_names, scoring, X_df, y)
        556 
        557         # If we ran GridSearchCV, we will have to pick the best model
    
    /Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in perform_grid_search_by_model_names(self, estimator_names, scoring, X_df, y)
        671 
        672             if self.ml_for_analytics and model_name in ('LogisticRegression', 'RidgeClassifier', 'LinearRegression', 'Ridge'):
    --> 673                 self._print_ml_analytics_results_regression()
        674             elif self.ml_for_analytics and model_name in ['RandomForestClassifier', 'RandomForestRegressor', 'XGBClassifier', 'XGBRegressor', 'GradientBoostingRegressor', 'GradientBoostingClassifier']:
        675                 self._print_ml_analytics_results_random_forest()
    
    /Users/alex/anaconda/envs/dsi/lib/python2.7/site-packages/auto_ml/predictor.pyc in _print_ml_analytics_results_regression(self)
        770             trained_coefficients = self.trained_pipeline.named_steps['final_model'].model.coef_
        771 
    --> 772         feature_ranges = self.trained_pipeline.named_steps['final_model'].feature_ranges
        773 
        774         # TODO(PRESTON): readability. Can probably do this in a single zip statement.
    
    AttributeError: 'FinalModelATC' object has no attribute 'feature_ranges'
    

    Are you familiar with this type of issue?

    opened by akodate 9
  • far future: take in dataframes or other sparse data structures directly

    far future: take in dataframes or other sparse data structures directly

    right now taking in python dictionaries is awesome for it's flexibility and ease of development, but is killing us on memory, even if it is a super sparse data structure.

    one workaround we could do for this is described in https://github.com/ClimbsRocks/auto_ml/issues/40, though that feels fairly hacky. taking in a DataFrame seems much more obvious.

    opened by ClimbsRocks 9
  • Fix XGBoost error

    Fix XGBoost error

    It appears that the current XGBoost package that is installed with pip does not have the feature_importance_ attribute. Therefore if you install the xgboost package using pip install xgboost you will be unable to conduct feature extraction from the XGBClassifier or the XGBRegressor object.

    I made a workaround after trying to check for feature_importance_ because if the newest version of XGBoost is installed from source then feature_importance_ works fine so it will likely exist in future versions. But currently the version available by pip install xgboost does not provide the attribute.

    opened by a-holm 7
  • Got an unexpected keyword argument 'max_iter' in SGDClassifier

    Got an unexpected keyword argument 'max_iter' in SGDClassifier

    Fail to run the example in README.

    from auto_ml import Predictor
    from auto_ml.utils import get_boston_dataset
    
    df_train, df_test = get_boston_dataset()
    
    column_descriptions = {
        'MEDV': 'output'
        , 'CHAS': 'categorical'
    }
    
    ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
    
    ml_predictor.train(df_train)
    
    ml_predictor.score(df_test, df_test.MEDV)
    

    And here is the error message.

    ➜ python ./automl_demo.py
    Using TensorFlow backend.
    Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.
    
    If you have any issues, or new feature ideas, let us know at https://github.com/ClimbsRocks/auto_ml
    Now using the model training_params that you passed in:
    {}
    After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
    {'presort': False, 'warm_start': True, 'learning_rate': 0.1}
    Traceback (most recent call last):
      File "./automl_demo.py", line 13, in <module>
        ml_predictor.train(df_train)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/predictor.py", line 611, in train
        X_df = self.fit_transformation_pipeline(X_df, y, estimator_names)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/predictor.py", line 834, in fit_transformation_pipeline
        ppl = self._construct_pipeline(model_name=model_names[0], keep_cat_features=self.keep_cat_features)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/predictor.py", line 206, in _construct_pipeline
        final_model = utils_models.get_model_from_name(model_name, training_params=params)
      File "/usr/local/lib/python2.7/site-packages/auto_ml/utils_models.py", line 129, in get_model_from_name
        'SGDClassifier': SGDClassifier(max_iter=1000, tol=0.001),
    TypeError: __init__() got an unexpected keyword argument 'max_iter'
    
    opened by tobegit3hub 7
  • get bad score running the sample code

    get bad score running the sample code

    1. I configure everything and run the whole script and get negative score on the boston datasets. Is it just a sample since i get a bad score is normal ?

    2. The default is only using gradient boosting for the classification and regression and not automatically choose the best model for taining and prediction?

    opened by Aun0124 0
  • pip install automl gets stuck after installing multiprocess-0.70.7

    pip install automl gets stuck after installing multiprocess-0.70.7

    The following is the last snippet in the pip install logs before the installation gets stuck indefinitely:

    Collecting multiprocess>=0.70.7 Using cached multiprocess-0.70.11-py3-none-any.whl (98 kB) Using cached multiprocess-0.70.10.zip (2.4 MB) Using cached multiprocess-0.70.9.tar.gz (1.6 MB) Using cached multiprocess-0.70.8.tar.gz (1.6 MB) Using cached multiprocess-0.70.7.tar.gz (1.4 MB)

    Even without using the cached copies, the installation gets stuck at this point.

    Update: One possible reason for this error could be that \sklearn_deap2-0.2.2-py3.8\evolutionary_search\cv.py incorrectly tries to import check_scoring in the following manner:

    from sklearn.metrics.scorer import check_scoring

    instead of this:

    from sklearn.metrics import check_scoring

    opened by akshatpv 2
  • docs: fix simple typo, puncutation -> punctuation

    docs: fix simple typo, puncutation -> punctuation

    There is a small typo in docs/source/formatting_data.rst.

    Should read punctuation rather than puncutation.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 1
  • Update DataFrameVectorizer.py

    Update DataFrameVectorizer.py

    DeprecationWarning: The module is deprecated in version 0.21 and removed in version 0.23. This module was removed in the latest scikit-learn version. please remove this module.

    opened by karthikreddykuna 1
Releases(v2.7.0)
  • v2.7.0(Sep 12, 2017)

    Ensembling's back for it's alpha release, evolutionary algorithms are doing our hyperparameter search now, we've handled a bunch of dependency updates, and a bunch of smaller performance tweaks.

    Source code(tar.gz)
    Source code(zip)
  • v2.4.0(Jul 14, 2017)

    Using quantile regression, we can now return prediction intervals.

    Another minor change is adding in a column of absolute changes for feature_responses

    Source code(tar.gz)
    Source code(zip)
  • v2.3.5(Jul 9, 2017)

  • v2.2.1(Jun 13, 2017)

    Avoids double training deep learning models, changes how we sort and order features for analytics reporting, and adds a new _all_small_categories category to categorical ensembling.

    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Jun 6, 2017)

  • 2.1.5(May 18, 2017)

  • 2.1.2(May 3, 2017)

  • 2.1(Apr 19, 2017)

    Feature learning and categorical ensembling are really cool features that each get us 2-5% accuracy gains!

    For full info, check the docs.

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Apr 4, 2017)

    Enough incremental improvements have added up that we're now ready to mark a 2.0 release!

    Part of the progress also means deprecating a few unused features that were adding unnecessary complexity and preventing us from implementing new features like ensembling properly.

    New changes for the 2.0 release:

    • Refactored and cleaned up code. Ensembling should now be much easier to add in, and in a way that's fast enough to be used in production (getting predictions from 10 models should take less than 10x as long as getting predictions from 1 model)
    • Deprecated compute_power
    • Deprecated several methods for grid searching over transformation_pipeline hyperparameters (different methods for feature selection, whether or not to do feature scaling, etc.). We just directly made a decision to prioritize the final model hyperparameter search.
    • Deprecated the current implementation of ensembling. It was implemented in such a way that it was not quick enough to make predictions in prod, and thus, did not meet the primary use cases of this project. Part of removing it allows us to reimplement ensembling in a way that is prod-ready.
    • Deprecated X_test and y_test, except for working with calibrate_final_model.
    • Added better documentation on features that were in silent alpha release previously.
    • Improved test coverage!

    Major changes since the 1.0 release:

    • Integrations for deep learning (using TensorFlow and Keras)
    • Integration of Microsoft's LightGBM, which appears to be a possibly better version of XGBoost
    • Quite a bit more user logging, warning, and input validation/input cleaning
    • Quite a few edge case bug fixes and minor performance improvements
    • Fully automated test suite with decent test coverage!
    • Better documentation
    • Support for pandas DataFrames- much more space efficient than lists of dictionaries
    Source code(tar.gz)
    Source code(zip)
    auto_ml-2.0.0-py2.py3-none-any.whl(47.43 KB)
    auto_ml-2.0.0.tar.gz(41.64 KB)
  • v1.12.2(Mar 16, 2017)

    This will be our final release before v2.

    Includes many recent changes- Deep Learning with Keras/TensorFlow, more efficient hyperparameter optimization, Microsoft's LightGBM, more advanced logging for scoring, and quite a few minor usability improvements (like improved logging when input is not as expected).

    Source code(tar.gz)
    Source code(zip)
  • v1.3(Oct 11, 2016)

Owner
Preston Parry
Rock Climber, Biker, Community Builder, Teacher, data scientist & machine learning geek
Preston Parry
A Pythonic library for Nvidia Codec.

A Pythonic library for Nvidia Codec. The project is still in active development; expect breaking changes. Why another Python library for Nvidia Codec?

Zesen Qian 12 Dec 27, 2022
[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation

Target Adaptive Context Aggregation for Video Scene Graph Generation This is a PyTorch implementation for Target Adaptive Context Aggregation for Vide

Multimedia Computing Group, Nanjing University 44 Dec 14, 2022
This is the repository for Learning to Generate Piano Music With Sustain Pedals

SusPedal-Gen This is the official repository of Learning to Generate Piano Music With Sustain Pedals Demo Page Dataset The dataset used in this projec

Joann Ching 12 Sep 02, 2022
Pytorch version of SfmLearner from Tinghui Zhou et al.

SfMLearner Pytorch version This codebase implements the system described in the paper: Unsupervised Learning of Depth and Ego-Motion from Video Tinghu

Clément Pinard 909 Dec 22, 2022
Implementation of ReSeg using PyTorch

Implementation of ReSeg using PyTorch ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation Pascal-Part Annotations Pascal VOC 2010

Onur Kaplan 46 Nov 23, 2022
Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Code Transformer This is an official PyTorch implementation of the CodeTransformer model proposed in: D. Zügner, T. Kirschstein, M. Catasta, J. Leskov

Daniel Zügner 131 Dec 13, 2022
DeLighT: Very Deep and Light-Weight Transformers

DeLighT: Very Deep and Light-weight Transformers This repository contains the source code of our work on building efficient sequence models: DeFINE (I

Sachin Mehta 440 Dec 18, 2022
GANimation: Anatomically-aware Facial Animation from a Single Image (ECCV'18 Oral) [PyTorch]

GANimation: Anatomically-aware Facial Animation from a Single Image [Project] [Paper] Official implementation of GANimation. In this work we introduce

Albert Pumarola 1.8k Dec 28, 2022
Automatically creates genre collections for your Plex media

Plex Auto Genres Plex Auto Genres is a simple script that will add genre collection tags to your media making it much easier to search for genre speci

Shane Israel 63 Dec 31, 2022
Dashboard for the COVID19 spread

COVID-19 Data Explorer App A streamlit Dashboard for the COVID-19 spread. The app is live at: [https://covid19.cwerner.ai]. New data is queried from G

Christian Werner 22 Sep 29, 2022
PFLD pytorch Implementation

PFLD-pytorch Implementation of PFLD A Practical Facial Landmark Detector by pytorch. 1. install requirements pip3 install -r requirements.txt 2. Datas

zhaozhichao 669 Jan 02, 2023
An implementation of the methods presented in Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

An implementation of the methods presented in Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

Andrew Jesson 9 Apr 04, 2022
Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Swin-Transformer Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows. For more details, ple

旷视天元 MegEngine 9 Mar 14, 2022
StableSims is an open-source project aimed at simulating MakerDAO's Dai stablecoin system

StableSims is an open-source project aimed at simulating MakerDAO's Dai stablecoin system, initially used for researching optimal incentive parameters for Liquidations 2.0.

Blockchain at Berkeley 52 Nov 21, 2022
Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21

Skeletal-GNN Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21 Various deep learning techniques have been propose

37 Oct 23, 2022
TensorFlow CNN for fast style transfer

Fast Style Transfer in TensorFlow Add styles from famous paintings to any photo in a fraction of a second! It takes 100ms on a 2015 Titan X to style t

1 Dec 14, 2021
Catch-all collection of generative art made using processing

Generative art with Processing.py Some art I have created for fun. Dependencies Processing for Python, see how to download/use here Packages contained

2 Mar 12, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 07, 2023
Visual Adversarial Imitation Learning using Variational Models (VMAIL)

Visual Adversarial Imitation Learning using Variational Models (VMAIL) This is the official implementation of the NeurIPS 2021 paper. Project website

14 Nov 18, 2022
AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-l

Facebook Research 4.6k Jan 09, 2023