ETNA is an easy-to-use time series forecasting framework.

Overview

ETNA Time Series Library

Pipi version PyPI Status

Telegram

Homepage | Documentation | Tutorials | Contribution Guide | Release Notes

ETNA is an easy-to-use time series forecasting framework. It includes built in toolkits for time series preprocessing, feature generation, a variety of predictive models with unified interface - from classic machine learning to SOTA neural networks, models combination methods and smart backtesting. ETNA is designed to make working with time series simple, productive, and fun.

ETNA is the first python open source framework of Tinkoff.ru Artificial Intelligence Center. The library started as an internal product in our company - we use it in over 10+ projects now, so we often release updates. Contributions are welcome - check our Contribution Guide.

Installation

ETNA is on PyPI, so you can use pip to install it.

pip install --upgrade pip
pip install etna-ts

Get started

Here's some example code for a quick start.

import pandas as pd
from etna.datasets.tsdataset import TSDataset
from etna.models import ProphetModel

# Read the data
df = pd.read_csv("example_dataset.csv")
df["timestamp"] = pd.to_datetime(df["timestamp"])

# Create a TSDataset
df = TSDataset.to_dataset(df)
ts = TSDataset(df,freq='1d')

# Choose a horizon
HORIZON = 8

# Fit the model
model = ProphetModel()
model.fit(ts)

# Make the forecast
future_ts = ts.make_future(HORIZON)
forecast_ts = model.forecast(future_ts)

Tutorials

We have also prepared a set of tutorials for an easy introduction:

01. Get started

  • Creating TSDataset and time series plotting
  • Forecast single time series - Simple forecast, Prophet, Catboost
  • Forecast multiple time series

02. Backtest

  • What is backtest and how it works
  • How to run a validation
  • Validation visualisation

03. EDA

  • Visualization
    • Plot
    • Partial autocorrelation
    • Cross-correlation
    • Distribution
  • Outliers
    • Median method
    • Density method

Documentation

ETNA documentation is available here.

Acknowledgments

ETNA.Team

Alekseev Andrey, Shenshina Julia, Gabdushev Martin, Kolesnikov Sergey, Bunin Dmitriy, Chikov Aleksandr, Barinov Nikita, Romantsov Nikolay, Makhin Artem, Denisov Vladislav, Mitskovets Ivan, Munirova Albina

ETNA.Contributors

Levashov Artem, Podkidyshev Aleksey

License

Feel free to use our library in your commercial and private applications.

ETNA is covered by Apache 2.0. Read more about this license here

Comments
  • Notebook with forecasting strategies

    Notebook with forecasting strategies

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [x] Did you update the docs? We use Numpy format for all the methods and classes.
    • [x] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    #825

    opened by scanhex12 46
  • Update Notebooks with new EDA methods

    Update Notebooks with new EDA methods

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    closes #711

    opened by DBcreator 12
  • Fix notebooks in inference track

    Fix notebooks in inference track

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [x] Did you update the docs? We use Numpy format for all the methods and classes.
    • [x] Did you write any new necessary tests?
    • [x] Did you update the CHANGELOG?

    Proposed Changes

    Look #973.

    Closing issues

    Closes #973.

    opened by Mr-Geekman 12
  • Improve sample_acf and sample_pacf plots

    Improve sample_acf and sample_pacf plots

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [x] Did you update the docs? We use Numpy format for all the methods and classes.
    • [x] Did you write any new necessary tests?
    • [x] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    closes #682

    opened by DBcreator 6
  • Classification notebook

    Classification notebook

    Before submitting (must do checklist)

    • [ ] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    opened by alex-hse-repository 6
  • Poc: base classes for deep models and rnn and deepstate with examples

    Poc: base classes for deep models and rnn and deepstate with examples

    Before submitting (must do checklist)

    • [ ] Did you read the contribution guide?
    • [x] Did you update the docs? We use Numpy format for all the methods and classes.
    • [x] Did you write any new necessary tests?
    • [x] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    opened by martins0n 6
  • Enhance `TSDataset` to work with hierarchical series

    Enhance `TSDataset` to work with hierarchical series

    Before submitting (must do checklist)

    • [ ] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    closes #1028

    opened by alex-hse-repository 5
  • Speed up columns slices: `etna.datasets.utils.select_columns`

    Speed up columns slices: `etna.datasets.utils.select_columns`

    🚀 Feature Request

    In a lot of places we use df.loc[:, pd.IndexSlice[segments, column]] to select column from all the segments. It appears to be very slow on a lot of segments.

    We should find places where we use it and make sure that it can be replaced with df.loc[:, pd.IndexSlice[:, column]] without problems.

    Where was some problem with the second choice: #188. We should investigate is it still existing and in which conditions:

    1. Is it applicable for selection only one column? (SklearnTransform selects many)
    2. Can it be avoided by some trick in taking slices (sorting columns for example).

    Proposal

    1. Find all places with slow slice df.loc[:, pd.IndexSlice[segments, column]] where column is scalar. Replace them with function (you can add it etna.datasets.utils). Try to replace slow slice in function with fast slice: df.loc[:, pd.IndexSlice[:, column]. Make sure that in that case we don't have reordering of columns in different pandas versions.
    2. Do the same but with list of values in column (e.g. SklearnTransform) and investigate reordering issue during testing. We want to avoid it without putting all the segments into the slice.
    3. Make some benchmarking that changed transforms (or other calls) become faster. Add code for benchmarking and its results in the comments of PR. E.g. you can take dataframe with 50000 segments, 100 timestamps, 5 additional int columns, 5 additional float columns, 5 additional category columns.

    Test cases

    1. Make sure that current tests pass for scalar case.
    2. Make sure that current tests pass for list case.
    3. Add tests on function for selection of one column.
    4. Add tests on function for selection of multiple columns (in SklearnTransform we had some tests on reordering, it can be useful).

    Additional context

    No response

    enhancement important 
    opened by Mr-Geekman 5
  • Create assemble_pipelines 717

    Create assemble_pipelines 717

    Before submitting (must do checklist)

    • [ ] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    closes #717

    opened by scanhex12 5
  • Fix bugs and documentation for `plot_backtest` and `plot_backtest_interactive`

    Fix bugs and documentation for `plot_backtest` and `plot_backtest_interactive`

    IMPORTANT: Please do not create a Pull Request without creating an issue first.

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [x] Did you update the docs? We use Numpy format for all the methods and classes.
    • [x] Did you write any new necessary tests?
    • [x] Did you update the CHANGELOG?

    Type of Change

    • [ ] Examples / docs / tutorials / contributors update
    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] Improvement (non-breaking change which improves an existing feature)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Proposed Changes

    Look #664.

    Related Issue

    #664.

    Closing issues

    Closes #664.

    bug documentation 
    opened by Mr-Geekman 5
  • add flake8-bugbear

    add flake8-bugbear

    IMPORTANT: Please do not create a Pull Request without creating an issue first.

    Before submitting (must do checklist)

    • [x] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Type of Change

    • [x] Examples / docs / tutorials / contributors update
    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] Improvement (non-breaking change which improves an existing feature)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Proposed Changes

    Related Issue

    Closing issues

    opened by iKintosh 5
  • Create `BottomUpReconciliator`

    Create `BottomUpReconciliator`

    Before submitting (must do checklist)

    • [ ] Did you read the contribution guide?
    • [ ] Did you update the docs? We use Numpy format for all the methods and classes.
    • [ ] Did you write any new necessary tests?
    • [ ] Did you update the CHANGELOG?

    Proposed Changes

    Closing issues

    closes #1037

    opened by brsnw250 2
  • Create example notebook about hierarchical pipeline

    Create example notebook about hierarchical pipeline

    🚀 Feature Request

    Create example notebook explaining how to work with time series in etna

    Proposal

    • Notebook should contain explanation of the following:
    1. What are the hierarchical time series
    2. How to store the hierarchical time series in etna(hierarchical long format) and how to convert it to etna wide format with to_hierarchical_dataset
    3. How HierarchicalStructure works and how can it be created
    4. How to create TSDataset with hierarchical structure and how exog data works in case of hierarchical dataset
    5. What methods exists to forecast hierarchical time series + which methods we have in the library and how to use them
    6. Compere the HierarchucalPipeline and Pipeline for top-down and bottom-up cases

    Test cases

    No response

    Additional context

    No response

    enhancement notebook 
    opened by alex-hse-repository 0
  • Create `generate_hierarchical_df` method

    Create `generate_hierarchical_df` method

    🚀 Feature Request

    Create method to generate random hierarchical dataset

    Proposal

    1. In etna/datasets/datasets_generation.py create method:
    def generate_hierarchical_df(periods: int,  n_segments: List[int], freq: str = "D", start_time: str = "2000-01-01", ar_coef: Optional[list] = None, sigma: float = 1, random_seed: int = 1) -> pd.Dataframe
    

    Parameters:

    • n_segments -- number of segments on each level
    • Other parameters are the same as in generate_ar_df Description:
    • Validate n_segments: number of segments on each level should be lower than on the next level
    • Generate segments on the last level using generate_ar_df
    • Generate random tree with configuration of nodes on levels from n_segments
    • In the dataframe replace column segment with columns describing the structure of the tree(one column for each level)
    • Node names in the levels should be generated as follows "level_<level_id>_<segment_id>" -- you can come up with better ideas for naming
    • On the bottom level leave the default segment names from generate_ar_df
    1. Add example of creating dataset with hierarchical structure using generate_hierarchical_df to the docs of TSDataset here

    Test cases

    1. Method generate dataframe with correct properties
    • number of segments
    • number of periods
    • columns(timestamp, level columns, target)
    • level columns contains correct values(for some corner cases where randomness does not influence like n_segments=[1, 2])
    1. Check that we can convert this dataframe to the wide format using to_hierarchical_dataset

    Additional context

    No response

    enhancement 
    opened by alex-hse-repository 0
  • [BUG] AttributeError: 'NaiveModel'

    [BUG] AttributeError: 'NaiveModel'

    🐛 Bug Report

    при прохождении стартового мануала получаю ошибку AttributeError: 'NaiveModel' object has no attribute 'context_size'

    Expected behavior

    future_ts = train_ts.make_future(future_steps=HORIZON, tail_steps=model.context_size)

    Как исправить ошибку?

    How To Reproduce

    HORIZON = 8 from etna.models import NaiveModel

    Соответствует модели

    model = NaiveModel(lag=12) model.fit(train_ts)

    Сделайте прогноз

    future_ts = train_ts.make_future(future_steps=HORIZON, tail_steps=model.context_size) forecast_ts = model.forecast(future_ts, prediction_size=HORIZON)

    Environment

    python 3.9 etna: 1.13.0

    Additional context

    No response

    Checklist

    • [x] Bug appears at the latest library version
    bug 
    opened by vukeep 0
  • Create `BottomUpReconciler`

    Create `BottomUpReconciler`

    🚀 Feature Request

    Create reconciler implementing Bottom-Up approach

    Proposal

    1. Create class BottomUpReconciler(BaseReconciler)
    • Check that source_level is lower than target_level
    1. Implement method fit:
      • Receive dataset on the the level which is lower or equal than target_level, source_level
      • Aggregate the dataset to the source_level
      • Set mapping matrix as summing matrix from source to target level

    Test cases

    • Test constructor works correctly with different source and target levels
    • Test method fit saves the correct matrix in mapping_matrix for different source/target levels
      • source=target
      • source<target
      • source>target

    Additional context

    No response

    enhancement 
    opened by alex-hse-repository 0
  • Create `HierarchicalPipeline`

    Create `HierarchicalPipeline`

    🚀 Feature Request

    Create pipeline to process hierarchical time series

    Proposal

    Create class:

    class HierarchicalPipeline(Pipeline):
        def __init__(
                    self,
                    reconciler: BaseReconciler,
                    model: ModelType,
                    transforms: Sequence[Transform] = (),
                    horizon: int = 1
        ):
    
    1. Implement method fit:
    • Fit the reconciler using reconciler.fit method
    • Aggregate dataset on the source_level using reconciler.aggregate
    • Call the fit method of super class with generated dataset
    1. Implement method raw_forecast()
    • Call the forecast method of the super class
    1. Implement method forecast()
    • Call the raw_forecast
    • Generate the target dataset using reconciler.reconcile

    Test cases

    1. Test that after fit pipeline saves correct ts on the source_level of reconciler
    2. Test that raw_forecast generates forecast on the source_level of reconciler
    3. Test that forecast generates forecast on the target_level of reconciler
    4. Test that backtest works and produce correct metrics(you can use constant dataset for example)

    All the tests should cover both top-down and bottom-up reconcilers with correct source and target levels

    Additional context

    blocked by #1037 #1038 #1044

    enhancement 
    opened by alex-hse-repository 0
Releases(1.14.0)
  • 1.14.0(Dec 16, 2022)

    Highlights:

    • Add python 3.10 support (#1005)
    • Add experimental module with TimeSeriesBinaryClassifier and PredictabilityAnalyzer (#985), see example notebook for the ditails (#997)
    • Inference track results: add predict method to pipelines, teach some models to work with context, change hierarchy of base models, update notebook examples (#979)

    Full changelog:

    Added

    • Add python 3.10 support (#1005)
    • Add SumTranform(#1021)
    • Add plot_change_points_interactive (#988)
    • Add experimental module with TimeSeriesBinaryClassifier and PredictabilityAnalyzer (#985)
    • Inference track results: add predict method to pipelines, teach some models to work with context, change hierarchy of base models, update notebook examples (#979)
    • Add get_ruptures_regularization into experimental module (#1001)
    • Add example classification notebook for experimental classification feature (#997)

    Changed

    • Change returned model in get_model of BATSModel, TBATSModel (#987)
    • Add acf_plot, deprecated sample_acf_plot, sample_pacf_plot (#1004)
    • Change returned model in get_model of HoltWintersModel, HoltModel, SimpleExpSmoothingModel (#986)

    Fixed

    • Fix MinMaxDifferenceTransform import (#1030)
    • Fix release docs and docker images cron job (#982)
    • Fix forecast first point with CatBoostPerSegmentModel (#1010)
    • Fix hanging EDA notebook (#1027)
    • Fix hanging EDA notebook v2 + cache clean script (#1034)
    Source code(tar.gz)
    Source code(zip)
  • 1.13.0(Oct 10, 2022)

    Highlights:

    etna.auto module for pipeline greedy search with default pipelines pool wandb sweeps and optuna examples

    Full changelog:

    Added

    • Add greater_is_better property for Metric (#921)
    • etna.auto for greedy search, etna.auto.pool with default pipelines, etna.auto.optuna wrapper for optuna (#895)
    • Add MinMaxDifferenceTransform (#955)
    • Add wandb sweeps and optuna examples (#338)

    Changed

    • Make slicing faster in TSDataset._merge_exog, FilterFeaturesTransform, AddConstTransform, LambdaTransform, LagTransform, LogTransform, SklearnTransform, WindowStatisticsTransform; make CICD test different pandas versions (#900)
    • Mark some tests as long (#929)
    • Fix to_dict with nn models and add unsafe conversion for callbacks (#949)

    Fixed

    • Fix to_dict with function as parameter (#941)
    • Fix native networks to work with generated future equals to horizon (#936)
    • Fix SARIMAXModel to work with exogenous data on pmdarima>=2.0 (#940)
    • Teach catboost to work with encoders (#957)
    Source code(tar.gz)
    Source code(zip)
  • 1.12.0(Sep 5, 2022)

    Highlights:

    • ETNA native MLPModel
    • to_dict method in all the etna objects
    • DirectEnsemble implementing the direct forecasting strategy
    • Notebook about forecasting strategies

    Full changelog:

    Added

    • Function to transform etna objects to dict(#818)
    • MLPModel(#860)
    • DeadlineMovingAverageModel (#827)
    • DirectEnsemble (#824)
    • CICD: untaged docker image cleaner (#856)
    • Notebook about forecasting strategies (#864)
    • Add ChangePointSegmentationTransform, RupturesChangePointsModel (#821)

    Changed

    • Teach AutoARIMAModel to work with out-sample predictions (#830)
    • Make TSDataset.to_flatten faster for big datasets (#848)

    Fixed

    • Type hints for external users by PEP 561 (#868)
    • Type hints for Pipeline.model match models.nn(#768)
    • Fix behavior of SARIMAXModel if simple_differencing=True is set (#837)
    • Bug python3.7 and TypedDict import (867)
    • Fix deprecated pytorch lightning trainer flags (#866)
    • ProphetModel doesn't work with cap and floor regressors (#842)
    • Fix problem with encoding category types in OHE (#843)
    • Change Docker cuda image version from 11.1 to 11.6.2 (#838)
    • Optimize time complexity of determine_num_steps(#864)
    • All warning as errors(#880)
    • Update .gitignore with .DS_Store and checkpoints (#883)
    • Delete ROADMAP.md ([#904]https://github.com/tinkoff-ai/etna/pull/904)
    • Fix ci invalid cache (#896)
    Source code(tar.gz)
    Source code(zip)
  • 1.11.1(Aug 3, 2022)

  • 1.11.0(Jul 25, 2022)

    Highlights:

    • ETNA native RNN and base classes for deep learning models
    • Lambda transform
    • Prophet 1.1 support without c++ compiler dependency
    • Prediction intervals for DeepAR and TFTModel
    • Add known_future parameter to CLI

    Full changelog:

    Added

    • LSTM based RNN and native deep models base classes (#776)
    • Lambda transform (#762)
    • assemble pipelines (#774)
    • Tests on in-sample, out-sample predictions with gap for all models (#785)

    Changed

    • Add columns and mode parameters in plot_correlation_matrix (#726)
    • Add CatBoostPerSegmentModel and CatBoostMultiSegmentModel classes, deprecate CatBoostModelPerSegment and CatBoostModelMultiSegment (#779)
    • Allow Prophet update to 1.1 (#799)
    • Make LagTransform, LogTransform, AddConstTransform vectorized (#756)
    • Improve the behavior of plot_feature_relevance visualizing p-values (#795)
    • Update poetry.core version (#780)
    • Make native prediction intervals for DeepAR (#761)
    • Make native prediction intervals for TFTModel (#770)
    • Test cases for testing inference of models (#794)
    • Wandb.log to WandbLogger (#816)

    Fixed

    • Fix missing prophet in docker images (#767)
    • Add known_future parameter to CLI (#758)
    • FutureWarning: The frame.append method is deprecated. Use pandas.concat instead (#764)
    • Correct ordering if multi-index in backtest (#771)
    • Raise errors in models.nn if they can't make in-sample and some cases out-sample predictions (#813)
    • Teach BATS/TBATS to work with in-sample, out-sample predictions correctly (#806)
    • Github actions cache issue with poetry update (#778)
    Source code(tar.gz)
    Source code(zip)
  • 1.10.0(Jun 15, 2022)

    Highlights:

    • BATS, TBATS and AutoArima models
    • Fix of empirical prediction intervals

    Full changelog:

    Added

    • Add Sign metric (#730)
    • Add AutoARIMA model (#679)
    • Add parameters start, end to some eda methods (#665)
    • Add BATS and TBATS model adapters (#678)
    • Jupyter extension for black (#742)

    Changed

    • Change color of lines in plot_anomalies and plot_clusters, add grid to all plots, make trend line thicker in plot_trend (#705)
    • Change format of holidays for holiday_plot (#708)
    • Make feature selection transforms return columns in inverse_transform(#688)
    • Add xticks parameter for plot_periodogram, clip frequencies to be >= 1 (#706)
    • Make TSDataset method to_dataset work with copy of the passed dataframe (#741)

    Fixed

    • Fix bug when ts.plot does not save figure (#714)
    • Fix bug in plot_clusters (#675)
    • Fix bugs and documentation for cross_corr_plot (#691)
    • Fix bugs and documentation for plot_backtest and plot_backtest_interactive (#700)
    • Make STLTransform to work with NaNs at the beginning (#736)
    • Fix tiny prediction intervals (#722)
    • Fix deepcopy issue for fitted deepmodel (#735)
    • Fix making backtest if all segments start with NaNs (#728)
    • Fix logging issues with backtest while emp intervals using (#747)
    Source code(tar.gz)
    Source code(zip)
  • 1.9.0(May 17, 2022)

    Added

    • Add plot_metric_per_segment (#658)
    • Add metric_per_segment_distribution_plot (#666)

    Changed

    • Remove parameter normalize in linear models (#686)

    Fixed

    • Add missed forecast_params in forecast CLI method (#671)
    • Add _per_segment_average method to the Metric class (#684)
    • Fix get_statistics_relevance_table working with NaNs and categoricals (#672)
    • Fix bugs and documentation for stl_plot (#685)
    • Fix cuda docker images (#694])
    Source code(tar.gz)
    Source code(zip)
  • 1.8.0(Apr 28, 2022)

    Added

    • Width and Coverage metrics for prediction intervals (#638)
    • Masked backtest (#613)
    • Add seasonal_plot (#628)
    • Add plot_periodogram (#606)
    • Add support of quantiles in backtest (#652)
    • Add prediction_actual_scatter_plot (#610)
    • Add plot_holidays (#624)
    • Add instruction about documentation formatting to contribution guide (#648)
    • Seasonal strategy in TimeSeriesImputerTransform (#639)

    Changed

    • Add logging to Metric.__call__ (#643)
    • Add in_column to plot_anomalies, plot_anomalies_interactive (#618)
    • Add logging to TSDataset.inverse_transform (#642)

    Fixed

    • Passing non default params for default models STLTransform (#641)
    • Fixed bug in SARIMAX model with horizon=1 (#637)
    • Fixed bug in models get_model method (#623)
    • Fixed unsafe comparison in plots (#611)
    • Fixed plot_trend does not work with Linear and TheilSen transforms (#617)
    • Improve computation time for rolling window statistics (#625)
    • Don't fill first timestamps in TimeSeriesImputerTransform (#634)
    • Fix documentation formatting (#636)
    • Fix bug with exog features in AutoRegressivePipeline (#647)
    • Fix missed dependencies (#656)
    • Fix custom_transform_and_model notebook (#651)
    • Fix MyBinder bug with dependencies (#650)
    Source code(tar.gz)
    Source code(zip)
  • 1.7.0(Mar 16, 2022)

    Highlights:

    • New plots (a lot!): imputation, trend, change points, residuals, qq-plot, feature relevance, stl.
    • New regressors logic in TSDatasets, Transforms and Models
    • Added jupyter notebook with regressors example
    • Prediction intervals visualization in plot_forecast
    • Detrending could be polynomial
    • Added installation instruction for M1
    • Fixed TSDataset when plot method does not plot all required segments
    • VotingEnsemble allows to set weights of estimator as weights of pipelines

    Full changelog:

    Added

    • Regressors logic to TSDatasets init (https://github.com/tinkoff-ai/etna/pull/357)
    • FutureMixin into some transforms (https://github.com/tinkoff-ai/etna/pull/361)
    • Regressors updating in TSDataset transform loops (https://github.com/tinkoff-ai/etna/pull/374)
    • Regressors handling in TSDataset make_future and train_test_split (https://github.com/tinkoff-ai/etna/pull/447)
    • Prediction intervals visualization in plot_forecast (https://github.com/tinkoff-ai/etna/pull/538)
    • Add plot_imputation (https://github.com/tinkoff-ai/etna/pull/598)
    • Add plot_time_series_with_change_points function (https://github.com/tinkoff-ai/etna/pull/534)
    • Add plot_trend (https://github.com/tinkoff-ai/etna/pull/565)
    • Add find_change_points function (https://github.com/tinkoff-ai/etna/pull/521)
    • Add option day_number_in_year to DateFlagsTransform (https://github.com/tinkoff-ai/etna/pull/552)
    • Add plot_residuals (https://github.com/tinkoff-ai/etna/pull/539)
    • Add get_residuals (https://github.com/tinkoff-ai/etna/pull/597)
    • Create PerSegmentBaseModel, PerSegmentPredictionIntervalModel (https://github.com/tinkoff-ai/etna/pull/537)
    • Create MultiSegmentModel (https://github.com/tinkoff-ai/etna/pull/551)
    • Add qq_plot (https://github.com/tinkoff-ai/etna/pull/604)
    • Add regressors example notebook (https://github.com/tinkoff-ai/etna/pull/577)
    • Create EnsembleMixin (https://github.com/tinkoff-ai/etna/pull/574)
    • Add option season_number to DateFlagsTransform (https://github.com/tinkoff-ai/etna/pull/567)
    • Create BasePipeline, add prediction intervals to all the pipelines, move parameter n_fold to forecast (https://github.com/tinkoff-ai/etna/pull/578)
    • Add stl_plot (https://github.com/tinkoff-ai/etna/pull/575)
    • Add plot_features_relevance (https://github.com/tinkoff-ai/etna/pull/579)
    • Add community section to README.md (https://github.com/tinkoff-ai/etna/pull/580)
    • Create AbstaractPipeline (https://github.com/tinkoff-ai/etna/pull/573)
    • Option "auto" to weights parameter of VotingEnsemble, enables to use feature importance as weights of base estimators (https://github.com/tinkoff-ai/etna/pull/587)

    Changed

    • Change the way ProphetModel works with regressors (https://github.com/tinkoff-ai/etna/pull/383)
    • Change the way SARIMAXModel works with regressors (https://github.com/tinkoff-ai/etna/pull/380)
    • Change the way Sklearn models works with regressors (https://github.com/tinkoff-ai/etna/pull/440)
    • Change the way FeatureSelectionTransform works with regressors, rename variables replacing the "regressor" to "feature" (https://github.com/tinkoff-ai/etna/pull/522)
    • Add table option to ConsoleLogger (https://github.com/tinkoff-ai/etna/pull/544)
    • Installation instruction (https://github.com/tinkoff-ai/etna/pull/526)
    • Update plot_forecast for multi-forecast mode (https://github.com/tinkoff-ai/etna/pull/584)
    • Trainer kwargs for deep models (https://github.com/tinkoff-ai/etna/pull/540)
    • Update CONTRIBUTING.md (https://github.com/tinkoff-ai/etna/pull/536)
    • Rename _CatBoostModel, _HoltWintersModel, _SklearnModel (https://github.com/tinkoff-ai/etna/pull/543)
    • Add logging to TSDataset.make_future, log repr of transform instead of class name (https://github.com/tinkoff-ai/etna/pull/555)
    • Rename _SARIMAXModel and _ProphetModel, make SARIMAXModel and ProphetModel inherit from PerSegmentPredictionIntervalModel (https://github.com/tinkoff-ai/etna/pull/549)
    • Update get_started section in README (https://github.com/tinkoff-ai/etna/pull/569)
    • Make detrending polynomial (https://github.com/tinkoff-ai/etna/pull/566)
    • Update documentation about transforms that generate regressors, update examples with them (https://github.com/tinkoff-ai/etna/pull/572)
    • Fix that segment is string (https://github.com/tinkoff-ai/etna/pull/602)
    • Make LabelEncoderTransform and OneHotEncoderTransform multi-segment (https://github.com/tinkoff-ai/etna/pull/554)

    Fixed

    • Fix TSDataset._update_regressors logic removing the regressors (https://github.com/tinkoff-ai/etna/pull/489)
    • Fix TSDataset.info, TSDataset.describe methods (https://github.com/tinkoff-ai/etna/pull/519)
    • Fix regressors handling for OneHotEncoderTransform and HolidayTransform (https://github.com/tinkoff-ai/etna/pull/518)
    • Fix wandb summary issue with custom plots (https://github.com/tinkoff-ai/etna/pull/535)
    • Small notebook fixes (https://github.com/tinkoff-ai/etna/pull/595)
    • Fix import Literal in plotters (https://github.com/tinkoff-ai/etna/pull/558)
    • Fix plot method bug when plot method does not plot all required segments (https://github.com/tinkoff-ai/etna/pull/596)
    • Fix dependencies for ARM (https://github.com/tinkoff-ai/etna/pull/599)
    • [BUG] nn models make forecast without inverse_transform (https://github.com/tinkoff-ai/etna/pull/541)
    Source code(tar.gz)
    Source code(zip)
  • 1.6.3(Feb 14, 2022)

    Highlights:

    • Fix for version incompatibility of scipy and statsmodels

    Full changelog:

    Fixed

    • Fixed adding unnecessary lag=1 in statistics (#523)
    • Fixed wrong MeanTransform behaviour when using alpha parameter (#523)
    • Fix processing add_noise=True parameter in datasets generation (#520)
    • Fix scipy version (#525)
    Source code(tar.gz)
    Source code(zip)
  • 1.6.2(Feb 9, 2022)

  • 1.6.1(Feb 3, 2022)

    Full changelog:

    Added

    • Allow choosing start and end in TSDataset.plot method (488)

    Changed

    • Make TSDataset.to_flatten faster (#475)
    • Allow logger percentile metric aggregation to work with NaNs (#483)

    Fixed

    • Can't make forecasting with pipelines, data with nans, and Imputers (#473)
    Source code(tar.gz)
    Source code(zip)
  • 1.6.0(Jan 28, 2022)

    Highlights:

    • New transforms for feature engineering: DifferencingTransform, OneHotEncoderTransform, LabelEncoderTransform, MADTransform.
    • New transform for feature selection: MRMRFeatureSelectionTransform.
    • Warnings in docstrings about possible look-ahead bias in case of using some transfroms.
    • Version update of sklearn, pytorch-forecasting and PytorchForecastingTransform api minor changes.
    • Fixes for SARIMAX non-default parameters.
    • TSDataset.describe method for high-level information about provided time series: % of missing values, number of segments, first and last dates and etc.

    Full changelog:

    Added

    • Method TSDataset.info (#409)
    • DifferencingTransform (#414)
    • OneHotEncoderTransform and LabelEncoderTransform (#431)
    • MADTransform (#441)
    • MRMRFeatureSelectionTransform (#439)
    • Possibility to change metric representation in backtest using Metric.name (#454)
    • Warning section in documentation about look-ahead bias (#464)
    • Parameter figsize to all the plotters #465

    Changed

    • Change method TSDataset.describe (#409)
    • Group Transforms according to their impact (#420)
    • Change the way LagTransform, DateFlagsTransform and TimeFlagsTransform generate column names (#421)
    • Clarify the behaviour of TimeSeriesImputerTransform in case of all NaN values (#427)
    • Fixed bug in title in sample_acf_plot method (#432)
    • Pytorch-forecasting and sklearn version update + some pytroch transform API changing (#445)

    Fixed

    • Add relevance_params in GaleShapleyFeatureSelectionTransform (#410)
    • Docs for statistics transforms (#441)
    • Handling NaNs in trend transforms (#456)
    • Logger fails with StackingEnsemble (#460)
    • SARIMAX parameters fix (#459)
    • [BUG] Check pytorch-forecasting models with freq > "1D" (#463)
    Source code(tar.gz)
    Source code(zip)
  • 1.5.0(Dec 24, 2021)

    Highlights:

    • We extend our family of loggers by adding S3FileLogger and LocalFileLogger. They partially duplicate behaviour of WandbLogger: you can run multiple experiments (via Optuna, HyperOpt or cutom loop as example) with different hyperparameters and transformers, save results locally or on S3 and analyze results afterwards.
    • HolidayTransfrom on the base of holidays library.
    • Bug fixies for prediction intervals - now they change after inverse_transform like target.
    • We change behaviour of fit_transform:
      • before we raised error if some timeseries ended on NaN values
      • now checking will be made only before forecasting phase, so you can fill NaNs with TimeSeriesImputerTransform and make predictions without raised errors.

    N.B.

    Special thanks to @Gewissta and his videos about timeseries analysis with ETNA library

    Full changelog:

    Added

    • Holiday Transform (#359)
    • S3FileLogger and LocalFileLogger (#372)
    • Parameter changepoint_prior_scale to ProphetModel (#408)

    Changed

    • Set strict_optional = True for mypy (#381)
    • Move checking the series endings to make_future step (#413)

    Fixed

    • Sarimax bug in future prediction with quantiles (#391)
    • Catboost version too high (#394)
    • Add sorting of classes in left bar in docs (#397)
    • nn notebook in docs (#396)
    • SklearnTransform column name generation (#398)
    • Inverse transform doesn't affect quantiles (#395)
    Source code(tar.gz)
    Source code(zip)
  • 1.4.2(Dec 9, 2021)

  • 1.4.1(Dec 9, 2021)

    • Made Model, PerSegmentModel, PerSegmentWrapper imports more convenient
    • Docs now have all neural networks models
    • Speed up _check_regressors and _merge_exog
    Source code(tar.gz)
    Source code(zip)
  • 1.4.0(Dec 3, 2021)

    Hi! In this release we have focused on speed and bug fixes.

    Added

    • ACF plot

    Changed

    • Add ts.inverse_transform as final step at Pipeline.fit method
    • Make test_ts optional in plot_forecast
    • Speed up inference for multisegment regression models
    • Speed up Pipeline._get_backtest_forecasts
    • Speed up SegmentEncoderTransform
    • Wandb Logger does not work unless pytorch is installed

    Fixed

    • Get rid of lambda in DensityOutliersTransform and get_anomalies_density
    • Fixed import in transforms
    • Pickle DTWClustering

    Removed

    • Remove TimeSeriesCrossValidation
    Source code(tar.gz)
    Source code(zip)
  • 1.3.3(Nov 24, 2021)

    Added:

    • RelevanceTable can return rank
    • GaleShapleyFeatureSelectionTransform based one Gale-Shapley algorithm
    • FilterFeaturesTransform for selecting features from TSDataset while feature engineering
    • ResampleWithDistributionTransform helps to resample features according to the other feature distribution
    • Spell checks in ci

    Changed:

    • Rename confidence interval to prediction interval, start working with quantiles instead of interval_width
    • Changed format of forecast and test dataframes in WandbLogger
    Source code(tar.gz)
    Source code(zip)
  • 1.3.2(Nov 18, 2021)

  • 1.3.1(Nov 12, 2021)

  • 1.3.0(Nov 12, 2021)

    We are happy to announce 1.3.0 version of the etna library!

    We focused on making etna even more user friendly as well as added new features.

    We have added:

    • CLI for backtesting
    • MeanSegmentEncoderTransform
    • Several feature relevance algorithms
    • TreeFeatureSelectionTransform

    We have fixed:

    • Bugs in loggers when aggregate_metrics=True
    • Bug when TSDataset did not create future if exogenous data has empty future
    • links in CLI documentation
    Source code(tar.gz)
    Source code(zip)
  • 1.3.0-alpha.0(Oct 28, 2021)

    In progress...

    In this prerelease we are testing optional dependencies. Be careful!

    Docs available at https://unstable--etna-docs.netlify.app

    Source code(tar.gz)
    Source code(zip)
  • 1.2.0(Oct 27, 2021)

    Boom! Huge update!

    Added

    • Even more documentation
    • Even more Jupyter Notebooks with examples
    • Pipeline class, helps unite models and transforms
    • Ensemble classes, helps unite models
    • AutoRegressivePipeline
    • Add confidence intervals to pipelines, models and transforms
    • Add new Transforms
    • Add clustering methods

    Changed

    • backtest moved to Pipeline class

    Fixed

    • pandas bugs
    • TSDataset.to_dataset bug

    More in our Changelog

    Source code(tar.gz)
    Source code(zip)
  • 1.2.0-alpha.1(Oct 18, 2021)

  • 1.2.0-alpha.0(Oct 14, 2021)

    Added

    • BinsegTrendTransform, ChangePointsTrendTransform (#87)
    • Interactive plot for anomalies (#95)
    • Examples to TSDataset methods with doctest (#92)
    • WandbLogger (#71)
    • Pipeline (#78)
    • Sequence anomalies (#96), Histogram anomalies (#79)
    • 'is_weekend' feature in DateFlagsTransform (#101)
    • Documentation example for models and note about inplace nature of forecast (#112)
    • Property regressors to TSDataset (#82)
    • Clustering (#110)
    • Outliers notebook (#123))
    • Method inverse_transform in TimeSeriesImputerTransform (#135)
    • VotingEnsemble (#150)
    • Forecast command for cli (#133)
    • MyPy checks in CI/CD and lint commands (#39)
    • TrendTransform (#139)
    • Running notebooks in ci (#134)
    • Cluster plotter to EDA (#169)
    • Pipeline.backtest method (#161, #192)
    • STLTransform class (#158)
    • NN_examples notebook (#159)
    • Example for ProphetModel (#178)
    • Instruction notebook for custom model and transform creation (#180)
    • Add inverse_transform in *OutliersTransform (#160)
    • Examples for CatBoostModelMultiSegment and CatBoostModelPerSegment (#181)

    Changed

    • Delete offset from WindowStatisticsTransform (#111)
    • Add Pipeline example in Get started notebook (#115)
    • Internal implementation of BinsegTrendTransform (#141)
    • Colorebar scaling in Correlation heatmap plotter (#143)
    • Add Correlation heatmap in EDA notebook (#144)
    • Add __repr__ for Pipeline (#151)
    • Defined random state for every test cases (#155)
    • Add confidence intervals to Prophet (#153)
    • Add confidence intervals to SARIMA (#172)

    Fixed

    • Set default value of TSDataset.head method (#170)
    • Categorical and fillna issues with pandas >=1.2 (#190)
    Source code(tar.gz)
    Source code(zip)
  • 1.1.3(Oct 8, 2021)

  • 1.1.2(Oct 8, 2021)

    Just some bug fixes:

    Changed

    • SklearnTransform out column names (#99)
    • Update EDA notebook (#96)
    • Add 'regressor_' prefix to output columns of LagTransform, DateFlagsTransform, SpecialDaysTransform, SegmentEncoderTransform

    Fixed

    • Add more obvious Exception Error for forecasting with unfitted model (#102)
    • Fix bug with hardcoded frequency in PytorchForecastingTransform (#107)
    • Bug with inverse_transform method of TimeSeriesImputerTransform (#148)
    Source code(tar.gz)
    Source code(zip)
  • 1.1.2-alpha.0(Oct 7, 2021)

    In progress... Fixing bugs

    Changed

    • SklearnTransform out column names (#99)
    • Update EDA notebook (#96)
    • Add 'regressor_' prefix to output columns of LagTransform, DateFlagsTransform, SpecialDaysTransform, SegmentEncoderTransform

    Fixed

    • Add more obvious Exception Error for forecasting with unfitted model (#102)
    • Fix bug with hardcoded frequency in PytorchForecastingTransform (#107)
    • Bug with inverse_transform method of TimeSeriesImputerTransform (#148)
    Source code(tar.gz)
    Source code(zip)
  • 1.1.0(Sep 22, 2021)

    In this release we focused on adding even more features to our library. Please meet new models and transforms:

    Added

    • MedianOutliersTransform, DensityOutliersTransform (#30)
    • Issues and Pull Request templates
    • TSDataset checks (#24, #20)
    • Pytorch-Forecasting models (#29)
    • SARIMAX model (#10)
    • Logging, including ConsoleLogger (#46)
    • Correlation heatmap plotter (#77)

    Changed

    • Backtest is fully parallel
    • New default hyperparameters for CatBoost

    Fixed

    • Documentation fixes (#55, #53, #52)
    • Solved warning in LogTransform and AddConstantTransform (#26)
    • Regressors does not have enough history bug (#35)
    • make_future(1) and make_future(2) bug
    • Fix working with 'cap' and 'floor' features in Prophet model (#62))
    • Fix saving init params for SARIMAXModel (#81)
    • Imports of nn models, PytorchForecastingTransform and Transform (#80))
    Source code(tar.gz)
    Source code(zip)
Owner
Tinkoff.AI
Tinkoff AI Center
Tinkoff.AI
CobraML: Completely Customizable A python ML library designed to give the end user full control

CobraML: Completely Customizable What is it? CobraML is a python library built on both numpy and numba. Unlike other ML libraries CobraML gives the us

Sriram Govindan 14 Dec 19, 2021
决策树分类与回归模型的实现和可视化

DecisionTree 决策树分类与回归模型,以及可视化 DecisionTree ID3 C4.5 CART 分类 回归 决策树绘制 分类树 回归树 调参 剪枝 ID3 ID3决策树是最朴素的决策树分类器: 无剪枝 只支持离散属性 采用信息增益准则 在data.py中,我们记录了一个小的西瓜数据

Welt Xing 10 Oct 22, 2022
This is a Machine Learning model which predicts the presence of Diabetes in Patients

Diabetes Disease Prediction This is a machine Learning mode which tries to determine if a person has a diabetes or not. Data The dataset is in comma s

Edem Gold 4 Mar 16, 2022
icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

vincent d warmerdam 24 Dec 09, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Jan 05, 2023
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

802 Jan 01, 2023
Bonsai: Gradient Boosted Trees + Bayesian Optimization

Bonsai is a wrapper for the XGBoost and Catboost model training pipelines that leverages Bayesian optimization for computationally efficient hyperparameter tuning.

24 Oct 27, 2022
A Time Series Library for Apache Spark

Flint: A Time Series Library for Apache Spark The ability to analyze time series data at scale is critical for the success of finance and IoT applicat

Two Sigma 970 Jan 04, 2023
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

MLReef 1.4k Dec 27, 2022
Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led

Brett Vogelsang 2 Jan 18, 2022
Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

Trading Tesla with Machine Learning and Sentiment Analysis An interactive program to train a Random Forest Classifier to predict Tesla daily prices us

Renato Votto 31 Nov 17, 2022
Land Cover Classification Random Forest

You can perform Land Cover Classification on Satellite Images using Random Forest and visualize the result using Earthpy package. Make sure to install the required packages and such as

Dr. Sander Ali Khowaja 1 Jan 21, 2022
Decision Tree Regression algorithm implemented on Python from scratch.

Decision_Tree_Regression I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when

1 Dec 22, 2021
A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

17 Aug 14, 2022
MLOps pipeline project using Amazon SageMaker Pipelines

This project shows steps to build an end to end MLOps architecture that covers data prep, model training, realtime and batch inference, build model registry, track lineage of artifacts and model drif

AWS Samples 3 Sep 16, 2022
Code for the TCAV ML interpretability project

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Been Kim, Martin Wattenberg, Justin Gilmer, C

552 Dec 27, 2022
Bottleneck a collection of fast, NaN-aware NumPy array functions written in C.

Bottleneck Bottleneck is a collection of fast, NaN-aware NumPy array functions written in C. As one example, to check if a np.array has any NaNs using

Python for Data 835 Dec 27, 2022
Upgini : data search library for your machine learning pipelines

Automated data search library for your machine learning pipelines → find & deliver relevant external data & features to boost ML accuracy :chart_with_upwards_trend:

Upgini 175 Jan 08, 2023
Machine Learning Algorithms

Machine-Learning-Algorithms In this project, the dataset was created through a survey opened on Google forms. The purpose of the form is to find the p

Göktuğ Ayar 3 Aug 10, 2022
LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading. The framework simplify development, testing, deployment, analysis and training algo trading strategies

Amichay Oren 458 Dec 24, 2022