ETNA – time series forecasting framework

Overview

ETNA Time Series Library

Predict your time series the easiest way

Pipi version PyPI Status Coverage

Telegram

Homepage | Documentation | Tutorials | Contribution Guide | Release Notes

ETNA is an easy-to-use time series forecasting framework. It includes built in toolkits for time series preprocessing, feature generation, a variety of predictive models with unified interface - from classic machine learning to SOTA neural networks, models combination methods and smart backtesting. ETNA is designed to make working with time series simple, productive, and fun.

ETNA is the first python open source framework of Tinkoff.ru Artificial Intelligence Center. The library started as an internal product in our company - we use it in over 10+ projects now, so we often release updates. Contributions are welcome - check our Contribution Guide.

Installation

ETNA is on PyPI, so you can use pip to install it.

pip install --upgrade pip
pip install etna

Get started

Here's some example code for a quick start.

import pandas as pd
from etna.datasets.tsdataset import TSDataset
from etna.models import ProphetModel
from etna.pipeline import Pipeline

# Read the data
df = pd.read_csv("examples/data/example_dataset.csv")

# Create a TSDataset
df = TSDataset.to_dataset(df)
ts = TSDataset(df, freq="D")

# Choose a horizon
HORIZON = 8

# Fit the pipeline
pipeline = Pipeline(model=ProphetModel(), horizon=HORIZON)
pipeline.fit(ts)

# Make the forecast
forecast_ts = pipeline.forecast()

Tutorials

We have also prepared a set of tutorials for an easy introduction:

Notebook Interactive launch
Get started Binder
Backtest Binder
EDA Binder
Outliers Binder
Clustering Binder
Deep learning models Binder
Ensembles Binder

Documentation

ETNA documentation is available here.

Acknowledgments

ETNA.Team

Andrey Alekseev, Nikita Barinov, Dmitriy Bunin, Aleksandr Chikov, Vladislav Denisov, Martin Gabdushev, Sergey Kolesnikov, Artem Makhin, Ivan Mitskovets, Albina Munirova, Nikolay Romantsov, Julia Shenshina

ETNA.Contributors

Artem Levashov, Aleksey Podkidyshev

License

Feel free to use our library in your commercial and private applications.

ETNA is covered by Apache 2.0. Read more about this license here

Comments
  • Notebook with forecasting strategies

    Notebook with forecasting strategies

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [x] Did you update the docs? We use Numpy format for all the methods and classes.
    • [x] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    #825

    opened by scanhex12 46
  • Update Notebooks with new EDA methods

    Update Notebooks with new EDA methods

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    closes #711

    opened by DBcreator 12
  • Fix notebooks in inference track

    Fix notebooks in inference track

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [x] Did you update the docs? We use Numpy format for all the methods and classes.
    • [x] Did you write any new necessary tests?
    • [x] Did you update the CHANGELOG?

    Proposed Changes

    Look #973.

    Closing issues

    Closes #973.

    opened by Mr-Geekman 12
  • Improve sample_acf and sample_pacf plots

    Improve sample_acf and sample_pacf plots

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [x] Did you update the docs? We use Numpy format for all the methods and classes.
    • [x] Did you write any new necessary tests?
    • [x] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    closes #682

    opened by DBcreator 6
  • Classification notebook

    Classification notebook

    Before submitting (must do checklist)

    • [ ] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    opened by alex-hse-repository 6
  • Poc: base classes for deep models and rnn and deepstate with examples

    Poc: base classes for deep models and rnn and deepstate with examples

    Before submitting (must do checklist)

    • [ ] Did you read the contribution guide?
    • [x] Did you update the docs? We use Numpy format for all the methods and classes.
    • [x] Did you write any new necessary tests?
    • [x] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    opened by martins0n 6
  • Enhance `TSDataset` to work with hierarchical series

    Enhance `TSDataset` to work with hierarchical series

    Before submitting (must do checklist)

    • [ ] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    closes #1028

    opened by alex-hse-repository 5
  • Speed up columns slices: `etna.datasets.utils.select_columns`

    Speed up columns slices: `etna.datasets.utils.select_columns`

    🚀 Feature Request

    In a lot of places we use df.loc[:, pd.IndexSlice[segments, column]] to select column from all the segments. It appears to be very slow on a lot of segments.

    We should find places where we use it and make sure that it can be replaced with df.loc[:, pd.IndexSlice[:, column]] without problems.

    Where was some problem with the second choice: #188. We should investigate is it still existing and in which conditions:

    1. Is it applicable for selection only one column? (SklearnTransform selects many)
    2. Can it be avoided by some trick in taking slices (sorting columns for example).

    Proposal

    1. Find all places with slow slice df.loc[:, pd.IndexSlice[segments, column]] where column is scalar. Replace them with function (you can add it etna.datasets.utils). Try to replace slow slice in function with fast slice: df.loc[:, pd.IndexSlice[:, column]. Make sure that in that case we don't have reordering of columns in different pandas versions.
    2. Do the same but with list of values in column (e.g. SklearnTransform) and investigate reordering issue during testing. We want to avoid it without putting all the segments into the slice.
    3. Make some benchmarking that changed transforms (or other calls) become faster. Add code for benchmarking and its results in the comments of PR. E.g. you can take dataframe with 50000 segments, 100 timestamps, 5 additional int columns, 5 additional float columns, 5 additional category columns.

    Test cases

    1. Make sure that current tests pass for scalar case.
    2. Make sure that current tests pass for list case.
    3. Add tests on function for selection of one column.
    4. Add tests on function for selection of multiple columns (in SklearnTransform we had some tests on reordering, it can be useful).

    Additional context

    No response

    enhancement important 
    opened by Mr-Geekman 5
  • Create assemble_pipelines 717

    Create assemble_pipelines 717

    Before submitting (must do checklist)

    • [ ] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    closes #717

    opened by scanhex12 5
  • Fix bugs and documentation for `plot_backtest` and `plot_backtest_interactive`

    Fix bugs and documentation for `plot_backtest` and `plot_backtest_interactive`

    IMPORTANT: Please do not create a Pull Request without creating an issue first.

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [x] Did you update the docs? We use Numpy format for all the methods and classes.
    • [x] Did you write any new necessary tests?
    • [x] Did you update the CHANGELOG?

    Type of Change

    • [ ] Examples / docs / tutorials / contributors update
    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] Improvement (non-breaking change which improves an existing feature)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Proposed Changes

    Look #664.

    Related Issue

    #664.

    Closing issues

    Closes #664.

    bug documentation 
    opened by Mr-Geekman 5
  • add flake8-bugbear

    add flake8-bugbear

    IMPORTANT: Please do not create a Pull Request without creating an issue first.

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Type of Change

    • [x] Examples / docs / tutorials / contributors update
    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] Improvement (non-breaking change which improves an existing feature)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Proposed Changes

    Related Issue

    Closing issues

    opened by iKintosh 5
  • Create example notebook about hierarchical pipeline

    Create example notebook about hierarchical pipeline

    🚀 Feature Request

    Create example notebook explaining how to work with time series in etna

    Proposal

    • Notebook should contain explanation of the following:
    1. What are the hierarchical time series
    2. How to store the hierarchical time series in etna(hierarchical long format) and how to convert it to etna wide format with to_hierarchical_dataset
    3. How HierarchicalStructure works and how can it be created
    4. How to create TSDataset with hierarchical structure and how exog data works in case of hierarchical dataset
    5. What methods exists to forecast hierarchical time series + which methods we have in the library and how to use them
    6. Compere the HierarchucalPipeline and Pipeline for top-down and bottom-up cases

    Test cases

    No response

    Additional context

    No response

    enhancement notebook 
    opened by alex-hse-repository 0
  • Create `generate_hierarchical_df` method

    Create `generate_hierarchical_df` method

    🚀 Feature Request

    Create method to generate random hierarchical dataset

    Proposal

    1. In etna/datasets/datasets_generation.py create method:
    def generate_hierarchical_df(periods: int,  n_segments: List[int], freq: str = "D", start_time: str = "2000-01-01", ar_coef: Optional[list] = None, sigma: float = 1, random_seed: int = 1) -> pd.Dataframe
    

    Parameters:

    • n_segments -- number of segments on each level
    • Other parameters are the same as in generate_ar_df Description:
    • Validate n_segments: number of segments on each level should be lower than on the next level
    • Generate segments on the last level using generate_ar_df
    • Generate random tree with configuration of nodes on levels from n_segments
    • In the dataframe replace column segment with columns describing the structure of the tree(one column for each level)
    • Node names in the levels should be generated as follows "level_<level_id>_<segment_id>" -- you can come up with better ideas for naming
    • On the bottom level leave the default segment names from generate_ar_df
    1. Add example of creating dataset with hierarchical structure using generate_hierarchical_df to the docs of TSDataset here

    Test cases

    1. Method generate dataframe with correct properties
    • number of segments
    • number of periods
    • columns(timestamp, level columns, target)
    • level columns contains correct values(for some corner cases where randomness does not influence like n_segments=[1, 2])
    1. Check that we can convert this dataframe to the wide format using to_hierarchical_dataset

    Additional context

    No response

    enhancement 
    opened by alex-hse-repository 0
  • [BUG] AttributeError: 'NaiveModel'

    [BUG] AttributeError: 'NaiveModel'

    🐛 Bug Report

    при прохождении стартового мануала получаю ошибку AttributeError: 'NaiveModel' object has no attribute 'context_size'

    Expected behavior

    future_ts = train_ts.make_future(future_steps=HORIZON, tail_steps=model.context_size)

    Как исправить ошибку?

    How To Reproduce

    HORIZON = 8 from etna.models import NaiveModel

    Соответствует модели

    model = NaiveModel(lag=12) model.fit(train_ts)

    Сделайте прогноз

    future_ts = train_ts.make_future(future_steps=HORIZON, tail_steps=model.context_size) forecast_ts = model.forecast(future_ts, prediction_size=HORIZON)

    Environment

    python 3.9 etna: 1.13.0

    Additional context

    No response

    Checklist

    • [x] Bug appears at the latest library version
    bug 
    opened by vukeep 0
  • Create `HierarchicalPipeline`

    Create `HierarchicalPipeline`

    🚀 Feature Request

    Create pipeline to process hierarchical time series

    Proposal

    Create class:

    class HierarchicalPipeline(Pipeline):
        def __init__(
                    self,
                    reconciler: BaseReconciler,
                    model: ModelType,
                    transforms: Sequence[Transform] = (),
                    horizon: int = 1
        ):
    
    1. Implement method fit:
    • Fit the reconciler using reconciler.fit method
    • Aggregate dataset on the source_level using reconciler.aggregate
    • Call the fit method of super class with generated dataset
    1. Implement method raw_forecast()
    • Call the forecast method of the super class
    1. Implement method forecast()
    • Call the raw_forecast
    • Generate the target dataset using reconciler.reconcile

    Test cases

    1. Test that after fit pipeline saves correct ts on the source_level of reconciler
    2. Test that raw_forecast generates forecast on the source_level of reconciler
    3. Test that forecast generates forecast on the target_level of reconciler
    4. Test that backtest works and produce correct metrics(you can use constant dataset for example)

    All the tests should cover both top-down and bottom-up reconcilers with correct source and target levels

    Additional context

    blocked by #1037 #1038 #1044

    enhancement 
    opened by alex-hse-repository 0
  • Add `params_to_tune` method

    Add `params_to_tune` method

    🚀 Feature Request

    For AutoML track we need knowledge of hyperparameters to tune for Transformers and Models and Pipelines. We should add method params_to_tune to all classes.

    Proposal

    • Add default method for Transform and *AbstractModel's it should return empty dict
    • Add to method to AbstractPipeline
    • Add method for Pipeline it should iterate over transforms and model and collect all params.
    • Add implementation for Ensemble and Stacking - raise NotImplementedError
    • return value of params_to_tune is supposed to be like
     params = {
        "model.n_iterations": optuna.distributions.CategoricalDistribution((10, 100, 200)),
        "transforms.0.mode": optuna.distributions.CategoricalDistribution(("per-segment", "macro")),
    }
    

    Test cases

    • check default implementation for Transform and Models
    • check if Pipeline correctly combine params
    • check if Ensemble and Stacking rise error

    Additional context

    No response

    enhancement 
    opened by martins0n 0
  • `set_params` method

    `set_params` method

    🚀 Feature Request

    Add method to change parameters of etna objects

    Proposal

    • BaseMixin.set_params: Callable[Dict] -> BaseMixin
    • It works like assert Pipeline(model=CatboostMultiSegmentModel(n_iterations=100)).set_params({'model.n_iterations': 1000}) == Pipeline(model=CatboostMultiSegmentModel(n_iterations=1000))
    • We suppose to make conversion via to_dict methods (dict -> patching dict -> creation new object from dict)

    Test cases

    • check that set_params change params
    • check if input dict have unknown parameters we do nothing

    Additional context

    No response

    enhancement 
    opened by martins0n 0
Releases(1.14.0)
  • 1.14.0(Dec 16, 2022)

    Highlights:

    • Add python 3.10 support (#1005)
    • Add experimental module with TimeSeriesBinaryClassifier and PredictabilityAnalyzer (#985), see example notebook for the ditails (#997)
    • Inference track results: add predict method to pipelines, teach some models to work with context, change hierarchy of base models, update notebook examples (#979)

    Full changelog:

    Added

    • Add python 3.10 support (#1005)
    • Add SumTranform(#1021)
    • Add plot_change_points_interactive (#988)
    • Add experimental module with TimeSeriesBinaryClassifier and PredictabilityAnalyzer (#985)
    • Inference track results: add predict method to pipelines, teach some models to work with context, change hierarchy of base models, update notebook examples (#979)
    • Add get_ruptures_regularization into experimental module (#1001)
    • Add example classification notebook for experimental classification feature (#997)

    Changed

    • Change returned model in get_model of BATSModel, TBATSModel (#987)
    • Add acf_plot, deprecated sample_acf_plot, sample_pacf_plot (#1004)
    • Change returned model in get_model of HoltWintersModel, HoltModel, SimpleExpSmoothingModel (#986)

    Fixed

    • Fix MinMaxDifferenceTransform import (#1030)
    • Fix release docs and docker images cron job (#982)
    • Fix forecast first point with CatBoostPerSegmentModel (#1010)
    • Fix hanging EDA notebook (#1027)
    • Fix hanging EDA notebook v2 + cache clean script (#1034)
    Source code(tar.gz)
    Source code(zip)
  • 1.13.0(Oct 10, 2022)

    Highlights:

    etna.auto module for pipeline greedy search with default pipelines pool wandb sweeps and optuna examples

    Full changelog:

    Added

    • Add greater_is_better property for Metric (#921)
    • etna.auto for greedy search, etna.auto.pool with default pipelines, etna.auto.optuna wrapper for optuna (#895)
    • Add MinMaxDifferenceTransform (#955)
    • Add wandb sweeps and optuna examples (#338)

    Changed

    • Make slicing faster in TSDataset._merge_exog, FilterFeaturesTransform, AddConstTransform, LambdaTransform, LagTransform, LogTransform, SklearnTransform, WindowStatisticsTransform; make CICD test different pandas versions (#900)
    • Mark some tests as long (#929)
    • Fix to_dict with nn models and add unsafe conversion for callbacks (#949)

    Fixed

    • Fix to_dict with function as parameter (#941)
    • Fix native networks to work with generated future equals to horizon (#936)
    • Fix SARIMAXModel to work with exogenous data on pmdarima>=2.0 (#940)
    • Teach catboost to work with encoders (#957)
    Source code(tar.gz)
    Source code(zip)
  • 1.12.0(Sep 5, 2022)

    Highlights:

    • ETNA native MLPModel
    • to_dict method in all the etna objects
    • DirectEnsemble implementing the direct forecasting strategy
    • Notebook about forecasting strategies

    Full changelog:

    Added

    • Function to transform etna objects to dict(#818)
    • MLPModel(#860)
    • DeadlineMovingAverageModel (#827)
    • DirectEnsemble (#824)
    • CICD: untaged docker image cleaner (#856)
    • Notebook about forecasting strategies (#864)
    • Add ChangePointSegmentationTransform, RupturesChangePointsModel (#821)

    Changed

    • Teach AutoARIMAModel to work with out-sample predictions (#830)
    • Make TSDataset.to_flatten faster for big datasets (#848)

    Fixed

    • Type hints for external users by PEP 561 (#868)
    • Type hints for Pipeline.model match models.nn(#768)
    • Fix behavior of SARIMAXModel if simple_differencing=True is set (#837)
    • Bug python3.7 and TypedDict import (867)
    • Fix deprecated pytorch lightning trainer flags (#866)
    • ProphetModel doesn't work with cap and floor regressors (#842)
    • Fix problem with encoding category types in OHE (#843)
    • Change Docker cuda image version from 11.1 to 11.6.2 (#838)
    • Optimize time complexity of determine_num_steps(#864)
    • All warning as errors(#880)
    • Update .gitignore with .DS_Store and checkpoints (#883)
    • Delete ROADMAP.md ([#904]https://github.com/tinkoff-ai/etna/pull/904)
    • Fix ci invalid cache (#896)
    Source code(tar.gz)
    Source code(zip)
  • 1.11.1(Aug 3, 2022)

  • 1.11.0(Jul 25, 2022)

    Highlights:

    • ETNA native RNN and base classes for deep learning models
    • Lambda transform
    • Prophet 1.1 support without c++ compiler dependency
    • Prediction intervals for DeepAR and TFTModel
    • Add known_future parameter to CLI

    Full changelog:

    Added

    • LSTM based RNN and native deep models base classes (#776)
    • Lambda transform (#762)
    • assemble pipelines (#774)
    • Tests on in-sample, out-sample predictions with gap for all models (#785)

    Changed

    • Add columns and mode parameters in plot_correlation_matrix (#726)
    • Add CatBoostPerSegmentModel and CatBoostMultiSegmentModel classes, deprecate CatBoostModelPerSegment and CatBoostModelMultiSegment (#779)
    • Allow Prophet update to 1.1 (#799)
    • Make LagTransform, LogTransform, AddConstTransform vectorized (#756)
    • Improve the behavior of plot_feature_relevance visualizing p-values (#795)
    • Update poetry.core version (#780)
    • Make native prediction intervals for DeepAR (#761)
    • Make native prediction intervals for TFTModel (#770)
    • Test cases for testing inference of models (#794)
    • Wandb.log to WandbLogger (#816)

    Fixed

    • Fix missing prophet in docker images (#767)
    • Add known_future parameter to CLI (#758)
    • FutureWarning: The frame.append method is deprecated. Use pandas.concat instead (#764)
    • Correct ordering if multi-index in backtest (#771)
    • Raise errors in models.nn if they can't make in-sample and some cases out-sample predictions (#813)
    • Teach BATS/TBATS to work with in-sample, out-sample predictions correctly (#806)
    • Github actions cache issue with poetry update (#778)
    Source code(tar.gz)
    Source code(zip)
  • 1.10.0(Jun 15, 2022)

    Highlights:

    • BATS, TBATS and AutoArima models
    • Fix of empirical prediction intervals

    Full changelog:

    Added

    • Add Sign metric (#730)
    • Add AutoARIMA model (#679)
    • Add parameters start, end to some eda methods (#665)
    • Add BATS and TBATS model adapters (#678)
    • Jupyter extension for black (#742)

    Changed

    • Change color of lines in plot_anomalies and plot_clusters, add grid to all plots, make trend line thicker in plot_trend (#705)
    • Change format of holidays for holiday_plot (#708)
    • Make feature selection transforms return columns in inverse_transform(#688)
    • Add xticks parameter for plot_periodogram, clip frequencies to be >= 1 (#706)
    • Make TSDataset method to_dataset work with copy of the passed dataframe (#741)

    Fixed

    • Fix bug when ts.plot does not save figure (#714)
    • Fix bug in plot_clusters (#675)
    • Fix bugs and documentation for cross_corr_plot (#691)
    • Fix bugs and documentation for plot_backtest and plot_backtest_interactive (#700)
    • Make STLTransform to work with NaNs at the beginning (#736)
    • Fix tiny prediction intervals (#722)
    • Fix deepcopy issue for fitted deepmodel (#735)
    • Fix making backtest if all segments start with NaNs (#728)
    • Fix logging issues with backtest while emp intervals using (#747)
    Source code(tar.gz)
    Source code(zip)
  • 1.9.0(May 17, 2022)

    Added

    • Add plot_metric_per_segment (#658)
    • Add metric_per_segment_distribution_plot (#666)

    Changed

    • Remove parameter normalize in linear models (#686)

    Fixed

    • Add missed forecast_params in forecast CLI method (#671)
    • Add _per_segment_average method to the Metric class (#684)
    • Fix get_statistics_relevance_table working with NaNs and categoricals (#672)
    • Fix bugs and documentation for stl_plot (#685)
    • Fix cuda docker images (#694])
    Source code(tar.gz)
    Source code(zip)
  • 1.8.0(Apr 28, 2022)

    Added

    • Width and Coverage metrics for prediction intervals (#638)
    • Masked backtest (#613)
    • Add seasonal_plot (#628)
    • Add plot_periodogram (#606)
    • Add support of quantiles in backtest (#652)
    • Add prediction_actual_scatter_plot (#610)
    • Add plot_holidays (#624)
    • Add instruction about documentation formatting to contribution guide (#648)
    • Seasonal strategy in TimeSeriesImputerTransform (#639)

    Changed

    • Add logging to Metric.__call__ (#643)
    • Add in_column to plot_anomalies, plot_anomalies_interactive (#618)
    • Add logging to TSDataset.inverse_transform (#642)

    Fixed

    • Passing non default params for default models STLTransform (#641)
    • Fixed bug in SARIMAX model with horizon=1 (#637)
    • Fixed bug in models get_model method (#623)
    • Fixed unsafe comparison in plots (#611)
    • Fixed plot_trend does not work with Linear and TheilSen transforms (#617)
    • Improve computation time for rolling window statistics (#625)
    • Don't fill first timestamps in TimeSeriesImputerTransform (#634)
    • Fix documentation formatting (#636)
    • Fix bug with exog features in AutoRegressivePipeline (#647)
    • Fix missed dependencies (#656)
    • Fix custom_transform_and_model notebook (#651)
    • Fix MyBinder bug with dependencies (#650)
    Source code(tar.gz)
    Source code(zip)
  • 1.7.0(Mar 16, 2022)

    Highlights:

    • New plots (a lot!): imputation, trend, change points, residuals, qq-plot, feature relevance, stl.
    • New regressors logic in TSDatasets, Transforms and Models
    • Added jupyter notebook with regressors example
    • Prediction intervals visualization in plot_forecast
    • Detrending could be polynomial
    • Added installation instruction for M1
    • Fixed TSDataset when plot method does not plot all required segments
    • VotingEnsemble allows to set weights of estimator as weights of pipelines

    Full changelog:

    Added

    • Regressors logic to TSDatasets init (https://github.com/tinkoff-ai/etna/pull/357)
    • FutureMixin into some transforms (https://github.com/tinkoff-ai/etna/pull/361)
    • Regressors updating in TSDataset transform loops (https://github.com/tinkoff-ai/etna/pull/374)
    • Regressors handling in TSDataset make_future and train_test_split (https://github.com/tinkoff-ai/etna/pull/447)
    • Prediction intervals visualization in plot_forecast (https://github.com/tinkoff-ai/etna/pull/538)
    • Add plot_imputation (https://github.com/tinkoff-ai/etna/pull/598)
    • Add plot_time_series_with_change_points function (https://github.com/tinkoff-ai/etna/pull/534)
    • Add plot_trend (https://github.com/tinkoff-ai/etna/pull/565)
    • Add find_change_points function (https://github.com/tinkoff-ai/etna/pull/521)
    • Add option day_number_in_year to DateFlagsTransform (https://github.com/tinkoff-ai/etna/pull/552)
    • Add plot_residuals (https://github.com/tinkoff-ai/etna/pull/539)
    • Add get_residuals (https://github.com/tinkoff-ai/etna/pull/597)
    • Create PerSegmentBaseModel, PerSegmentPredictionIntervalModel (https://github.com/tinkoff-ai/etna/pull/537)
    • Create MultiSegmentModel (https://github.com/tinkoff-ai/etna/pull/551)
    • Add qq_plot (https://github.com/tinkoff-ai/etna/pull/604)
    • Add regressors example notebook (https://github.com/tinkoff-ai/etna/pull/577)
    • Create EnsembleMixin (https://github.com/tinkoff-ai/etna/pull/574)
    • Add option season_number to DateFlagsTransform (https://github.com/tinkoff-ai/etna/pull/567)
    • Create BasePipeline, add prediction intervals to all the pipelines, move parameter n_fold to forecast (https://github.com/tinkoff-ai/etna/pull/578)
    • Add stl_plot (https://github.com/tinkoff-ai/etna/pull/575)
    • Add plot_features_relevance (https://github.com/tinkoff-ai/etna/pull/579)
    • Add community section to README.md (https://github.com/tinkoff-ai/etna/pull/580)
    • Create AbstaractPipeline (https://github.com/tinkoff-ai/etna/pull/573)
    • Option "auto" to weights parameter of VotingEnsemble, enables to use feature importance as weights of base estimators (https://github.com/tinkoff-ai/etna/pull/587)

    Changed

    • Change the way ProphetModel works with regressors (https://github.com/tinkoff-ai/etna/pull/383)
    • Change the way SARIMAXModel works with regressors (https://github.com/tinkoff-ai/etna/pull/380)
    • Change the way Sklearn models works with regressors (https://github.com/tinkoff-ai/etna/pull/440)
    • Change the way FeatureSelectionTransform works with regressors, rename variables replacing the "regressor" to "feature" (https://github.com/tinkoff-ai/etna/pull/522)
    • Add table option to ConsoleLogger (https://github.com/tinkoff-ai/etna/pull/544)
    • Installation instruction (https://github.com/tinkoff-ai/etna/pull/526)
    • Update plot_forecast for multi-forecast mode (https://github.com/tinkoff-ai/etna/pull/584)
    • Trainer kwargs for deep models (https://github.com/tinkoff-ai/etna/pull/540)
    • Update CONTRIBUTING.md (https://github.com/tinkoff-ai/etna/pull/536)
    • Rename _CatBoostModel, _HoltWintersModel, _SklearnModel (https://github.com/tinkoff-ai/etna/pull/543)
    • Add logging to TSDataset.make_future, log repr of transform instead of class name (https://github.com/tinkoff-ai/etna/pull/555)
    • Rename _SARIMAXModel and _ProphetModel, make SARIMAXModel and ProphetModel inherit from PerSegmentPredictionIntervalModel (https://github.com/tinkoff-ai/etna/pull/549)
    • Update get_started section in README (https://github.com/tinkoff-ai/etna/pull/569)
    • Make detrending polynomial (https://github.com/tinkoff-ai/etna/pull/566)
    • Update documentation about transforms that generate regressors, update examples with them (https://github.com/tinkoff-ai/etna/pull/572)
    • Fix that segment is string (https://github.com/tinkoff-ai/etna/pull/602)
    • Make LabelEncoderTransform and OneHotEncoderTransform multi-segment (https://github.com/tinkoff-ai/etna/pull/554)

    Fixed

    • Fix TSDataset._update_regressors logic removing the regressors (https://github.com/tinkoff-ai/etna/pull/489)
    • Fix TSDataset.info, TSDataset.describe methods (https://github.com/tinkoff-ai/etna/pull/519)
    • Fix regressors handling for OneHotEncoderTransform and HolidayTransform (https://github.com/tinkoff-ai/etna/pull/518)
    • Fix wandb summary issue with custom plots (https://github.com/tinkoff-ai/etna/pull/535)
    • Small notebook fixes (https://github.com/tinkoff-ai/etna/pull/595)
    • Fix import Literal in plotters (https://github.com/tinkoff-ai/etna/pull/558)
    • Fix plot method bug when plot method does not plot all required segments (https://github.com/tinkoff-ai/etna/pull/596)
    • Fix dependencies for ARM (https://github.com/tinkoff-ai/etna/pull/599)
    • [BUG] nn models make forecast without inverse_transform (https://github.com/tinkoff-ai/etna/pull/541)
    Source code(tar.gz)
    Source code(zip)
  • 1.6.3(Feb 14, 2022)

    Highlights:

    • Fix for version incompatibility of scipy and statsmodels

    Full changelog:

    Fixed

    • Fixed adding unnecessary lag=1 in statistics (#523)
    • Fixed wrong MeanTransform behaviour when using alpha parameter (#523)
    • Fix processing add_noise=True parameter in datasets generation (#520)
    • Fix scipy version (#525)
    Source code(tar.gz)
    Source code(zip)
  • 1.6.2(Feb 9, 2022)

  • 1.6.1(Feb 3, 2022)

    Full changelog:

    Added

    • Allow choosing start and end in TSDataset.plot method (488)

    Changed

    • Make TSDataset.to_flatten faster (#475)
    • Allow logger percentile metric aggregation to work with NaNs (#483)

    Fixed

    • Can't make forecasting with pipelines, data with nans, and Imputers (#473)
    Source code(tar.gz)
    Source code(zip)
  • 1.6.0(Jan 28, 2022)

    Highlights:

    • New transforms for feature engineering: DifferencingTransform, OneHotEncoderTransform, LabelEncoderTransform, MADTransform.
    • New transform for feature selection: MRMRFeatureSelectionTransform.
    • Warnings in docstrings about possible look-ahead bias in case of using some transfroms.
    • Version update of sklearn, pytorch-forecasting and PytorchForecastingTransform api minor changes.
    • Fixes for SARIMAX non-default parameters.
    • TSDataset.describe method for high-level information about provided time series: % of missing values, number of segments, first and last dates and etc.

    Full changelog:

    Added

    • Method TSDataset.info (#409)
    • DifferencingTransform (#414)
    • OneHotEncoderTransform and LabelEncoderTransform (#431)
    • MADTransform (#441)
    • MRMRFeatureSelectionTransform (#439)
    • Possibility to change metric representation in backtest using Metric.name (#454)
    • Warning section in documentation about look-ahead bias (#464)
    • Parameter figsize to all the plotters #465

    Changed

    • Change method TSDataset.describe (#409)
    • Group Transforms according to their impact (#420)
    • Change the way LagTransform, DateFlagsTransform and TimeFlagsTransform generate column names (#421)
    • Clarify the behaviour of TimeSeriesImputerTransform in case of all NaN values (#427)
    • Fixed bug in title in sample_acf_plot method (#432)
    • Pytorch-forecasting and sklearn version update + some pytroch transform API changing (#445)

    Fixed

    • Add relevance_params in GaleShapleyFeatureSelectionTransform (#410)
    • Docs for statistics transforms (#441)
    • Handling NaNs in trend transforms (#456)
    • Logger fails with StackingEnsemble (#460)
    • SARIMAX parameters fix (#459)
    • [BUG] Check pytorch-forecasting models with freq > "1D" (#463)
    Source code(tar.gz)
    Source code(zip)
  • 1.5.0(Dec 24, 2021)

    Highlights:

    • We extend our family of loggers by adding S3FileLogger and LocalFileLogger. They partially duplicate behaviour of WandbLogger: you can run multiple experiments (via Optuna, HyperOpt or cutom loop as example) with different hyperparameters and transformers, save results locally or on S3 and analyze results afterwards.
    • HolidayTransfrom on the base of holidays library.
    • Bug fixies for prediction intervals - now they change after inverse_transform like target.
    • We change behaviour of fit_transform:
      • before we raised error if some timeseries ended on NaN values
      • now checking will be made only before forecasting phase, so you can fill NaNs with TimeSeriesImputerTransform and make predictions without raised errors.

    N.B.

    Special thanks to @Gewissta and his videos about timeseries analysis with ETNA library

    Full changelog:

    Added

    • Holiday Transform (#359)
    • S3FileLogger and LocalFileLogger (#372)
    • Parameter changepoint_prior_scale to ProphetModel (#408)

    Changed

    • Set strict_optional = True for mypy (#381)
    • Move checking the series endings to make_future step (#413)

    Fixed

    • Sarimax bug in future prediction with quantiles (#391)
    • Catboost version too high (#394)
    • Add sorting of classes in left bar in docs (#397)
    • nn notebook in docs (#396)
    • SklearnTransform column name generation (#398)
    • Inverse transform doesn't affect quantiles (#395)
    Source code(tar.gz)
    Source code(zip)
  • 1.4.2(Dec 9, 2021)

  • 1.4.1(Dec 9, 2021)

    • Made Model, PerSegmentModel, PerSegmentWrapper imports more convenient
    • Docs now have all neural networks models
    • Speed up _check_regressors and _merge_exog
    Source code(tar.gz)
    Source code(zip)
  • 1.4.0(Dec 3, 2021)

    Hi! In this release we have focused on speed and bug fixes.

    Added

    • ACF plot

    Changed

    • Add ts.inverse_transform as final step at Pipeline.fit method
    • Make test_ts optional in plot_forecast
    • Speed up inference for multisegment regression models
    • Speed up Pipeline._get_backtest_forecasts
    • Speed up SegmentEncoderTransform
    • Wandb Logger does not work unless pytorch is installed

    Fixed

    • Get rid of lambda in DensityOutliersTransform and get_anomalies_density
    • Fixed import in transforms
    • Pickle DTWClustering

    Removed

    • Remove TimeSeriesCrossValidation
    Source code(tar.gz)
    Source code(zip)
  • 1.3.3(Nov 24, 2021)

    Added:

    • RelevanceTable can return rank
    • GaleShapleyFeatureSelectionTransform based one Gale-Shapley algorithm
    • FilterFeaturesTransform for selecting features from TSDataset while feature engineering
    • ResampleWithDistributionTransform helps to resample features according to the other feature distribution
    • Spell checks in ci

    Changed:

    • Rename confidence interval to prediction interval, start working with quantiles instead of interval_width
    • Changed format of forecast and test dataframes in WandbLogger
    Source code(tar.gz)
    Source code(zip)
  • 1.3.2(Nov 18, 2021)

  • 1.3.1(Nov 12, 2021)

  • 1.3.0(Nov 12, 2021)

    We are happy to announce 1.3.0 version of the etna library!

    We focused on making etna even more user friendly as well as added new features.

    We have added:

    • CLI for backtesting
    • MeanSegmentEncoderTransform
    • Several feature relevance algorithms
    • TreeFeatureSelectionTransform

    We have fixed:

    • Bugs in loggers when aggregate_metrics=True
    • Bug when TSDataset did not create future if exogenous data has empty future
    • links in CLI documentation
    Source code(tar.gz)
    Source code(zip)
  • 1.3.0-alpha.0(Oct 28, 2021)

    In progress...

    In this prerelease we are testing optional dependencies. Be careful!

    Docs available at https://unstable--etna-docs.netlify.app

    Source code(tar.gz)
    Source code(zip)
  • 1.2.0(Oct 27, 2021)

    Boom! Huge update!

    Added

    • Even more documentation
    • Even more Jupyter Notebooks with examples
    • Pipeline class, helps unite models and transforms
    • Ensemble classes, helps unite models
    • AutoRegressivePipeline
    • Add confidence intervals to pipelines, models and transforms
    • Add new Transforms
    • Add clustering methods

    Changed

    • backtest moved to Pipeline class

    Fixed

    • pandas bugs
    • TSDataset.to_dataset bug

    More in our Changelog

    Source code(tar.gz)
    Source code(zip)
  • 1.2.0-alpha.1(Oct 18, 2021)

  • 1.2.0-alpha.0(Oct 14, 2021)

    Added

    • BinsegTrendTransform, ChangePointsTrendTransform (#87)
    • Interactive plot for anomalies (#95)
    • Examples to TSDataset methods with doctest (#92)
    • WandbLogger (#71)
    • Pipeline (#78)
    • Sequence anomalies (#96), Histogram anomalies (#79)
    • 'is_weekend' feature in DateFlagsTransform (#101)
    • Documentation example for models and note about inplace nature of forecast (#112)
    • Property regressors to TSDataset (#82)
    • Clustering (#110)
    • Outliers notebook (#123))
    • Method inverse_transform in TimeSeriesImputerTransform (#135)
    • VotingEnsemble (#150)
    • Forecast command for cli (#133)
    • MyPy checks in CI/CD and lint commands (#39)
    • TrendTransform (#139)
    • Running notebooks in ci (#134)
    • Cluster plotter to EDA (#169)
    • Pipeline.backtest method (#161, #192)
    • STLTransform class (#158)
    • NN_examples notebook (#159)
    • Example for ProphetModel (#178)
    • Instruction notebook for custom model and transform creation (#180)
    • Add inverse_transform in *OutliersTransform (#160)
    • Examples for CatBoostModelMultiSegment and CatBoostModelPerSegment (#181)

    Changed

    • Delete offset from WindowStatisticsTransform (#111)
    • Add Pipeline example in Get started notebook (#115)
    • Internal implementation of BinsegTrendTransform (#141)
    • Colorebar scaling in Correlation heatmap plotter (#143)
    • Add Correlation heatmap in EDA notebook (#144)
    • Add __repr__ for Pipeline (#151)
    • Defined random state for every test cases (#155)
    • Add confidence intervals to Prophet (#153)
    • Add confidence intervals to SARIMA (#172)

    Fixed

    • Set default value of TSDataset.head method (#170)
    • Categorical and fillna issues with pandas >=1.2 (#190)
    Source code(tar.gz)
    Source code(zip)
  • 1.1.3(Oct 8, 2021)

  • 1.1.2(Oct 8, 2021)

    Just some bug fixes:

    Changed

    • SklearnTransform out column names (#99)
    • Update EDA notebook (#96)
    • Add 'regressor_' prefix to output columns of LagTransform, DateFlagsTransform, SpecialDaysTransform, SegmentEncoderTransform

    Fixed

    • Add more obvious Exception Error for forecasting with unfitted model (#102)
    • Fix bug with hardcoded frequency in PytorchForecastingTransform (#107)
    • Bug with inverse_transform method of TimeSeriesImputerTransform (#148)
    Source code(tar.gz)
    Source code(zip)
  • 1.1.2-alpha.0(Oct 7, 2021)

    In progress... Fixing bugs

    Changed

    • SklearnTransform out column names (#99)
    • Update EDA notebook (#96)
    • Add 'regressor_' prefix to output columns of LagTransform, DateFlagsTransform, SpecialDaysTransform, SegmentEncoderTransform

    Fixed

    • Add more obvious Exception Error for forecasting with unfitted model (#102)
    • Fix bug with hardcoded frequency in PytorchForecastingTransform (#107)
    • Bug with inverse_transform method of TimeSeriesImputerTransform (#148)
    Source code(tar.gz)
    Source code(zip)
  • 1.1.0(Sep 22, 2021)

    In this release we focused on adding even more features to our library. Please meet new models and transforms:

    Added

    • MedianOutliersTransform, DensityOutliersTransform (#30)
    • Issues and Pull Request templates
    • TSDataset checks (#24, #20)
    • Pytorch-Forecasting models (#29)
    • SARIMAX model (#10)
    • Logging, including ConsoleLogger (#46)
    • Correlation heatmap plotter (#77)

    Changed

    • Backtest is fully parallel
    • New default hyperparameters for CatBoost

    Fixed

    • Documentation fixes (#55, #53, #52)
    • Solved warning in LogTransform and AddConstantTransform (#26)
    • Regressors does not have enough history bug (#35)
    • make_future(1) and make_future(2) bug
    • Fix working with 'cap' and 'floor' features in Prophet model (#62))
    • Fix saving init params for SARIMAXModel (#81)
    • Imports of nn models, PytorchForecastingTransform and Transform (#80))
    Source code(tar.gz)
    Source code(zip)
Owner
Tinkoff.AI
Tinkoff AI Center
Tinkoff.AI
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
AutoOED: Automated Optimal Experiment Design Platform

AutoOED is an optimal experiment design platform powered with automated machine learning to accelerate the discovery of optimal solutions. Our platform solves multi-objective optimization problems an

Yunsheng Tian 107 Jan 03, 2023
neurodsp is a collection of approaches for applying digital signal processing to neural time series

neurodsp is a collection of approaches for applying digital signal processing to neural time series, including algorithms that have been proposed for the analysis of neural time series. It also inclu

NeuroDSP 224 Dec 02, 2022
Decision Weights in Prospect Theory

Decision Weights in Prospect Theory It's clear that humans are irrational, but how irrational are they? After some research into behavourial economics

Cameron Davidson-Pilon 32 Nov 08, 2021
💀mummify: a version control tool for machine learning

mummify is a version control tool for machine learning. It's simple, fast, and designed for model prototyping.

Max Humber 43 Jul 09, 2022
Forecasting prices using Facebook/Meta's Prophet model

CryptoForecasting using Machine and Deep learning (Part 1) CryptoForecasting using Machine Learning The main aspect of predicting the stock-related da

1 Nov 27, 2021
Official code for HH-VAEM

HH-VAEM This repository contains the official Pytorch implementation of the Hierarchical Hamiltonian VAE for Mixed-type Data (HH-VAEM) model and the s

Ignacio Peis 8 Nov 30, 2022
Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

Multiple-Linear-Regression-master - A python program to implement Multiple Linear Regression using the LinearRegression class from sklearn.linear model library

Kushal Shingote 1 Feb 06, 2022
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

shibuiwilliam 9 Sep 09, 2022
A high-performance topological machine learning toolbox in Python

giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G

giotto.ai 632 Dec 29, 2022
Python 3.6+ toolbox for submitting jobs to Slurm

Submit it! What is submitit? Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster. It basically wraps

Facebook Incubator 768 Jan 03, 2023
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Thoughtworks 318 Jan 02, 2023
Machine Learning University: Accelerated Natural Language Processing Class

Machine Learning University: Accelerated Natural Language Processing Class This repository contains slides, notebooks and datasets for the Machine Lea

AWS Samples 2k Jan 01, 2023
We have a dataset of user performances. The project is to develop a machine learning model that will predict the salaries of baseball players.

Salary-Prediction-with-Machine-Learning 1. Business Problem Can a machine learning project be implemented to estimate the salaries of baseball players

Ayşe Nur Türkaslan 9 Oct 14, 2022
Open-Source CI/CD platform for ML teams. Deliver ML products, better & faster. ⚡️🧑‍🔧

Deliver ML products, better & faster Giskard is an Open-Source CI/CD platform for ML teams. Inspect ML models visually from your Python notebook 📗 Re

Giskard 335 Jan 04, 2023
Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft 366 Jan 03, 2023
Python package for stacking (machine learning technique)

vecstack Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API Convenient wa

Igor Ivanov 671 Dec 25, 2022
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. I

MLJAR 2.4k Jan 02, 2023
K-means clustering is a method used for clustering analysis, especially in data mining and statistics.

K Means Algorithm What is K Means This algorithm is an iterative algorithm that partitions the dataset according to their features into K number of pr

1 Nov 01, 2021
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 03, 2023