Greykite: A flexible, intuitive and fast forecasting library

Overview

Greykite: A flexible, intuitive and fast forecasting library

Why Greykite?

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

Silverkite algorithm works well on most time series, and is especially adept for those with changepoints in trend or seasonality, event/holiday effects, and temporal dependencies. Its forecasts are interpretable and therefore useful for trusted decision-making and insights.

The Greykite library provides a framework that makes it easy to develop a good forecast model, with exploratory data analysis, outlier/anomaly preprocessing, feature extraction and engineering, grid search, evaluation, benchmarking, and plotting. Other open source algorithms can be supported through Greykite’s interface to take advantage of this framework, as listed below.

For a demo, please see our quickstart and pythondig.

Distinguishing Features

  • Flexible design
    • Provides time series regressors to capture trend, seasonality, holidays, changepoints, and autoregression, and lets you add your own.
    • Fits the forecast using a machine learning model of your choice.
  • Intuitive interface
    • Provides powerful plotting tools to explore seasonality, interactions, changepoints, etc.
    • Provides model templates (default parameters) that work well based on data characteristics and forecast requirements (e.g. daily long-term forecast).
    • Produces interpretable output, with model summary to examine individual regressors, and component plots to visually inspect the combined effect of related regressors.
  • Fast training and scoring
    • Facilitates interactive prototyping, grid search, and benchmarking. Grid search is useful for model selection and semi-automatic forecasting of multiple metrics.
  • Extensible framework
    • Exposes multiple forecast algorithms in the same interface, making it easy to try algorithms from different libraries and compare results.
    • The same pipeline provides preprocessing, cross-validation, backtest, forecast, and evaluation with any algorithm.

Algorithms currently supported within Greykite’s modeling framework:

Notable Components

Greykite offers components that could be used within other forecasting libraries or even outside the forecasting context.

  • ModelSummary() - R-like summaries of scikit-learn and statsmodels regression models.
  • ChangepointDetector() - changepoint detection based on adaptive lasso, with visualization.
  • SimpleSilverkiteForecast() - Silverkite algorithm with forecast_simple and predict methods.
  • SilverkiteForecast() - low-level interface to Silverkite algorithm with forecast and predict methods.

Usage Examples

You can obtain forecasts with only a few lines of code:

from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.autogen.forecast_config import MetadataParam
from greykite.framework.templates.forecaster import Forecaster
from greykite.framework.templates.model_templates import ModelTemplateEnum

# df = ...  # your input timeseries!
metadata = MetadataParam(
    time_col="ts",     # time column in `df`
    value_col="y"      # value in `df`
)
forecaster = Forecaster()  # creates forecasts and stores the result
forecaster.run_forecast_config(
     df=df,
     config=ForecastConfig(
         # uses the SILVERKITE model template parameters
         model_template=ModelTemplateEnum.SILVERKITE.name,
         forecast_horizon=365,  # forecasts 365 steps ahead
         coverage=0.95,         # 95% prediction intervals
         metadata_param=metadata
     )
 )
# Access the result
forecaster.forecast_result
# ...

For a demo, please see our quickstart.

Setup and Installation

Greykite is available on Pypi and can be installed with pip:

pip install greykite

For more installation tips, see installation.

Documentation

Please find our full documentation here.

Learn More

Citation

Please cite Greykite in your publications if it helps your research:

@misc{reza2021greykite-github,
  author = {Reza Hosseini and
            Albert Chen and
            Kaixu Yang and
            Sayan Patra and
            Rachit Arora},
  title  = {Greykite: a flexible, intuitive and fast forecasting library},
  url    = {https://github.com/linkedin/greykite},
  year   = {2021}
}

License

Copyright (c) LinkedIn Corporation. All rights reserved. Licensed under the BSD 2-Clause License.

Comments
  • Why pin runtime dependencies so tightly?

    Why pin runtime dependencies so tightly?

    Hi,

    Looking at the setup.py file, it looks like the following are all required runtime dependencies, all of which need to be pinned very precisely:

    requirements = [    "Cython==0.29.23",    "cvxpy==1.1.12",    "fbprophet==0.5",    "holidays==0.9.10",  # 0.10.2,    "ipykernel==4.8.2",    "ipython==7.1.1",    "ipywidgets==7.2.1",    "jupyter==1.0.0",    "jupyter-client==6.1.5",    "jupyter-console==6.",  # used version 6 to avoid conflict with ipython version    "jupyter-core==4.7.1",    "matplotlib==3.4.1",    "nbformat==5.1.3",    "notebook==5.4.1",    "numpy==1.20.2",    "osqp==0.6.1",    "overrides==2.8.0",    "pandas==1.1.3",    "patsy==0.5.1",    "Pillow==8.0.1",    "plotly==3.10.0",    "pystan==2.18.0.0",    "pyzmq==22.0.3",    "scipy==1.5.4",    "seaborn==0.9.0",    "six==1.15.0",    "scikit-learn==0.24.1",    "Sphinx==3.2.1",    "sphinx-gallery==0.6.1",    "sphinx-rtd-theme==0.4.2",    "statsmodels==0.12.2",    "testfixtures==6.14.2",    "tornado==5.1.1",    "tqdm==4.52.0"]
    

    My question is - why pin them so tightly, and are all of them really necessary? E.g. do I really need sphinx-gallery? Such tight pins make it very difficult to integrate into any existing project. Why not just require a lower bound for many/most of these?

    opened by MarcoGorelli 15
  • Regressors Already Forecastd, No Lag Needed. But, getting AttributeError: 'dict' object has no attribute 'lagged_regressors'

    Regressors Already Forecastd, No Lag Needed. But, getting AttributeError: 'dict' object has no attribute 'lagged_regressors'

    Hi, Thank you for providing this wonderful forecasting package! I've having the best time exploring the greykite package!

    However I ran into a tiny issue about regressors:

    I'm forecasting with regressor on this type of data: fake data So my regressor is already forecasted and no lagging needed. But I keep getting this error saying "AttributeError: 'dict' object has no attribute 'lagged_regressors'"

    FYI This is the config I've been using:

    metadata = MetadataParam(
        time_col='date', 
        value_col='fake_target',
        freq='D', 
        train_end_date = '2022-01-10'
    )
    evaluation_period = EvaluationPeriodParam(
        cv_max_splits=0 # This is to disable CV for demo purposes and just train it on the full data
    )
    evaluation_metric = EvaluationMetricParam(
        cv_selection_metric=EvaluationMetricEnum.RootMeanSquaredError.name
    )
    
    model_components = dict(
        growth=dict(growth_term=None),
        regressors=dict(
            regressor_cols=["fake_regressor"]
        ),
        autoregression=dict(autoreg_dict=None),
        lagged_regressors = dict(lagged_regressor_dict = None) # This is the confusing part, I explicitly said NO lagged terms
    )
    
    config = ForecastConfig(
        model_template=ModelTemplateEnum.SILVERKITE.name,
        forecast_horizon=5, 
        coverage=0.95, 
        metadata_param=metadata,
        evaluation_period_param=evaluation_period,
        evaluation_metric_param=evaluation_metric,
        model_components_param=model_components
    )
    
    # And run
    forecaster = Forecaster()
    result = forecaster.run_forecast_config( 
        df=train,
        config=config
    )
    
    

    And here's a screenshot of the error Screenshot 2022-12-06 at 3 06 17 PM

    I was wondering where I did wrong?

    Thank you in advance for your support and have a wonderful day! All the best, Kathy Gao

    opened by KathyGCY 7
  • Greykite Forecaster Model is Unpickle-able

    Greykite Forecaster Model is Unpickle-able

    Even basic implementation of greykite (see below) does not pickle properly, due to some of the design choices within Greykite (e.g. nested functions and namedtuple definitions within function class calls.

    Was this a purposeful design choice? Is there another method to save a trained model state and reuse the model to create inferences downstream? Integrations with deployment tools become much more challenging if we need to retrain the model every time and can't save the model state. Looking for guidance here on best practice - thanks!

    Here's code to reproduce the issue:

    from greykite.framework.templates.autogen.forecast_config import ForecastConfig
    from greykite.framework.templates.autogen.forecast_config import MetadataParam
    from greykite.framework.templates.forecaster import Forecaster
    from greykite.framework.templates.model_templates import ModelTemplateEnum
    
    import pandas as pd
    import numpy as np
    
    date_list = pd.date_range(start='2020-01-01', end='2022-01-01', freq='W-FRI')
    df_train = pd.DataFrame(
        {
            'week_end_date': date_list,
            'data': np.random.rand(len(date_list))
        }
    )
    
    metadata = MetadataParam(
        time_col="week_end_date",
        value_col=df_train.columns[-1],
        freq='W-FRI'
    )
    
    fc = Forecaster()
    result = fc.run_forecast_config(
        df=df_train,
        config=ForecastConfig(
            model_template=ModelTemplateEnum.SILVERKITE.name,
            forecast_horizon=52,
            coverage=0.95,         # 95% prediction intervals
            metadata_param=metadata
        )
    )
    
    import dill
    with open("pickle_out.b", "wb") as fp:
        dill.dump(result.model, fp)
        output_set = dill.load(fp)
    
    opened by kurtejung 6
  • Seasonality changepoint detection does not seem to work with cross-validation for Silverkite

    Seasonality changepoint detection does not seem to work with cross-validation for Silverkite

    Hi,

    First of all thank you for open-sourcing this library. It's really complete and well though (as well as the Silverkite algorithm itself).

    However, I think I have spotted a potential bug:

    It seems that the option seasonality_changepoints_dict in ModelComponentsParam does seem to break some functionality within pandas, when running Silverkite with cross-validation.

    Here's a complete example (using Greykite 0.2.0):

    import pandas as pd
    import numpy as np
    
    # Load airline passengers dataset (with monthly data):
    air_passengers = pd.read_csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv")
    air_passengers["Month"] = pd.to_datetime(air_passengers["Month"])
    air_passengers = air_passengers.set_index("Month").asfreq("MS").reset_index()
    
    # Prepare Greykite configs:
    from greykite.framework.templates.autogen.forecast_config import (ComputationParam, 
                                                                      EvaluationMetricParam, 
                                                                      EvaluationPeriodParam,
                                                                      ForecastConfig, 
                                                                      MetadataParam, 
                                                                      ModelComponentsParam)
    
    # Metadata:
    metadata_params = MetadataParam(date_format=None,  # infer
                                    freq="MS",
                                    time_col="Month",
                                    train_end_date=None,
                                    value_col="Passengers")
    
    # Eval metric:
    evaluation_metric_params = EvaluationMetricParam(agg_func=np.sum,   # Sum all forecasts...
                                                     agg_periods=12,    # ...Over 12 months
                                                     cv_report_metrics=["MeanSquaredError", "MeanAbsoluteError", "MeanAbsolutePercentError"],
                                                     cv_selection_metric="MeanAbsolutePercentError",
                                                     null_model_params=None,
                                                     relative_error_tolerance=None)
    
    # Eval procedure (CV & backtest):
    evaluation_period_params = EvaluationPeriodParam(cv_expanding_window=False,
                                                     cv_horizon=0,   # No CV for now. CHANGE THIS
                                                     cv_max_splits=5,
                                                     cv_min_train_periods=24,
                                                     cv_periods_between_splits=6,
                                                     cv_periods_between_train_test=0,
                                                     cv_use_most_recent_splits=False,
                                                     periods_between_train_test=0,
                                                     test_horizon=12)
    
    # Config for seasonality changepoints
    seasonality_components_df = pd.DataFrame({"name": ["conti_year"],
                                              "period": [1.0],
                                              "order": [5],
                                              "seas_names": ["yearly"]})
    
    # Model components (quite long):
    model_components_params = ModelComponentsParam(autoregression={"autoreg_dict": "auto"},
                                                   
                                                   changepoints={"changepoints_dict":  [{"method":"auto",
                                                                                         "potential_changepoint_n": 50,
                                                                                         "no_changepoint_proportion_from_end": 0.2,
                                                                                         "regularization_strength": 0.01}],
                                                                 
                                                                 # Seasonality changepoints
                                                                 "seasonality_changepoints_dict": [{"regularization_strength": 0.6,
                                                                                                    "no_changepoint_proportion_from_end": 0.8,
                                                                                                    "seasonality_components_df": seasonality_components_df,
                                                                                                    "potential_changepoint_n": 50,
                                                                                                    "resample_freq":"MS"},
                                                                                                   ]
                                                                },
                                                   
                                                   custom={"fit_algorithm_dict": [{"fit_algorithm": "linear"},
                                                                                  ],
                                                           "feature_sets_enabled": "auto",
                                                           "min_admissible_value": 0.0},
                                                   
                                                   events={"holiday_lookup_countries": None,
                                                           "holidays_to_model_separately": None,
                                                           },
                                                   
                                                   growth={"growth_term":["linear"]},
                                                   
                                                   hyperparameter_override={"input__response__outlier__z_cutoff": [100.0],
                                                                            "input__response__null__impute_algorithm": ["ts_interpolate"]},
                                                   
                                                   regressors=None,
                                                   
                                                   lagged_regressors=None,
                                                   
                                                   seasonality={"yearly_seasonality": [5],
                                                                "quarterly_seasonality": ["auto"],
                                                                "monthly_seasonality": False,
                                                                "weekly_seasonality": False,
                                                                "daily_seasonality": False},
                                                   
                                                   uncertainty=None)
    
    # Computation
    computation_params = ComputationParam(n_jobs=1,
                                          verbose=3)
    
    
    # Define forecaster:
    from greykite.framework.templates.forecaster import Forecaster
    
    # defines forecast configuration
    config=ForecastConfig(model_template="SILVERKITE",
                          forecast_horizon=12,
                          coverage=0.8,
                          metadata_param=metadata_params,
                          evaluation_metric_param=evaluation_metric_params,
                          evaluation_period_param=evaluation_period_params,
                          model_components_param=model_components_params,
                          computation_param=computation_params,
                         )
    
    # Run:
    # creates forecast
    forecaster = Forecaster()
    result = forecaster.run_forecast_config(df=air_passengers, 
                                            config=config 
                                            )
    

    If we run the piece of code above, everything works as expected. However, if we activate cross-validation (increasing cv_horizon to 5 for instance), Greykite crashes. This happens unless we remove seasonality changepoints (through removing seasonality_changepoints_dict).

    The crash traceback looks as follows:

    5 fits failed out of a total of 5.
    The score on these train-test partitions for these parameters will be set to nan.
    If these failures are not expected, you can try to debug them by setting error_score='raise'.
    
    Below are more details about the failures:
    --------------------------------------------------------------------------------
    5 fits failed with the following error:
    Traceback (most recent call last):
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_validation.py", line 681, in _fit_and_score
        estimator.fit(X_train, y_train, **fit_params)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\pipeline.py", line 394, in fit
        self._final_estimator.fit(Xt, y, **fit_params_last_step)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\sklearn\estimator\simple_silverkite_estimator.py", line 239, in fit
        self.model_dict = self.silverkite.forecast_simple(
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_simple_silverkite.py", line 708, in forecast_simple
        trained_model = super().forecast(**parameters)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_silverkite.py", line 719, in forecast
        seasonality_changepoint_result = get_seasonality_changepoints(
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoint_detector.py", line 1177, in get_seasonality_changepoints
        result = cd.find_seasonality_changepoints(**seasonality_changepoint_detection_args)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\common\python_utils.py", line 787, in fn_ignore
        return fn(*args, **kwargs)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoint_detector.py", line 736, in find_seasonality_changepoints
        seasonality_df = build_seasonality_feature_df_with_changes(
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoints_utils.py", line 237, in build_seasonality_feature_df_with_changes
        fs_truncated_df.loc[(features_df["datetime"] < date).values, cols] = 0
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 719, in __setitem__
        indexer = self._get_setitem_indexer(key)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 646, in _get_setitem_indexer
        self._ensure_listlike_indexer(key)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 709, in _ensure_listlike_indexer
        self.obj._mgr = self.obj._mgr.reindex_axis(
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\internals\base.py", line 89, in reindex_axis
        return self.reindex_indexer(
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\internals\managers.py", line 670, in reindex_indexer
        self.axes[axis]._validate_can_reindex(indexer)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexes\base.py", line 3785, in _validate_can_reindex
        raise ValueError("cannot reindex from a duplicate axis")
    ValueError: cannot reindex from a duplicate axis
    
    
    C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning:
    
    One or more of the test scores are non-finite: [nan]
    
    C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning:
    
    One or more of the train scores are non-finite: [nan]
    

    It would be great to cross-validate when seasonality changepoint is activated, as it allows to learn multiplicative seasonalities for instance in a similar fashion as Prophet or Orbit do.

    Thank you!

    opened by julioasotodv 6
  • "cv_selection_metric" & "cv_report_metrics"

    Hello all,

    I am running the Greykite without cross-validation (cv_max_splits = 0) because I am using the LassoCV() algorithm which by itself uses 5-fold CV. The ForecastConfig() is as follows, in particular, evaluation_metric is all set to None because cv_max_splits = 0:

    Capture

    However, the output on the console suggests that at least 3 metrics are evaluated. My response contains zeros so I do not want MAPE and MedAPE to be reported, and I do not want "Correlation" to be reported either. As a matter of fact, since the loss function in LassoCV() is MSE (L2-norm), I am not interested in anything rather than MSE, really. Unless the loss function in LassoCV() could be changed to MAE (L1-norm) in that case I would be interested in the MAE instead of MSE:

    Capture1

    Do you have any suggestions please ?

    Best regards, Dario

    opened by dromare 6
  • Greykite suitable for pure linear increasing series?

    Greykite suitable for pure linear increasing series?

    Hello

    I'm working in some house price time series using Greykite but for some reason, the forecast I got is just a median price between upper and lower (ARIMA), so is this known issue with Greykite when we have a pure linear increasing series?

    Thank you Aktham Momani greykite_forecast

    opened by akthammomani 6
  • Does multi-stage forecasting supports weekly aggregation as-well

    Does multi-stage forecasting supports weekly aggregation as-well

    Hi Team,

    Can you please confirm if multi-stage forecasting works on weekly aggregation as well.

    I tried with data that has daily frequency, so for one stage I kept the daily frequency & next stage its the weekly aggregation of the daily data.

    But getting the below error

    TypeError: '<' not supported between instances of 'pandas._libs.tslibs.offsets.Day' and 'pandas._libs.tslibs.offsets.Week'

    opened by canamika27 4
  • Wrong assignment to summary prediction categories

    Wrong assignment to summary prediction categories

    Hi all,

    I have added 69 regressors to ModelComponentsParam, so my model instance is as follows: ModelComponentsParam(autoregression={'autoreg_dict': 'auto'}, changepoints={'changepoints_dict': [None, {'method': 'auto'}]}, custom={'fit_algorithm_dict': [{'fit_algorithm': 'elastic_net', 'fit_algorithm_params': {'l1_ratio': array([0.01, 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 , 0.99]), 'n_alphas': 100, 'alphas': None, 'fit_intercept': True, 'cv': None, 'tol': 0.001, 'max_iter': 1000}}], 'feature_sets_enabled': 'auto', 'min_admissible_value': None, 'max_admissible_value': None}, events={'holiday_lookup_countries': [], 'holidays_to_model_separately': None, 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': None}, hyperparameter_override={'input__response__outlier__use_fit_baseline': False, 'input__response__outlier__z_cutoff': None, 'input__response__null__impute_algorithm': None, 'input__regressors_numeric__outlier__use_fit_baseline': False, 'input__regressors_numeric__outlier__z_cutoff': None, 'input__regressors_numeric__null__impute_algorithm': None, 'input__regressors_numeric__normalize__normalize_algorithm': 'RobustScaler', 'input__regressors_numeric__normalize__normalize_params': {'quantile_range': (10.0, 90.0)}, 'degenerate__drop_degenerate': False}, regressors={'regressor_cols': ['ownSame_5_NET_PRICE_BAG', 'media_total_spend', 'Discount_Depth', 'weather_wghtd_avg_tmp_flslk_2m_f_max_low', 'ownSame_3_NET_PRICE_BAG', 'media_tv_traditional', 'hol_EasterSunday', 'cg_STRNGNCY_LGCY_INDX_MEAN', 'hol_RestorationofIndependence_LAG1', 'NET_PRICE_BAG', 'hol_AllSaintsDay_LAG3', 'hol_holiday_count', 'hol_AssumptionDay_lead4', 'hol_AssumptionDay_lead2', 'cc_AVG_IR_MEAN', 'hol_spain_perc_wknd', 'media_digital', 'hol_CorpusChristi_LAG2', 'hol_AssumptionDay_lead3', 'hol_CorpusChristi_LAG3', 'inp_actual_inventory', 'weather_wghtd_avg_tmp_flslk_2m_f_min_high', 'gm_AVG_PRKS_PCNT_CHNG_FRM_BSLNE_MEAN', 'weather_wghtd_avg_tmp_flslk_2m_f_max_high', 'hol_holiday_count_lead4', 'hol_spain_hol_flag', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_mid', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_high', 'hol_ConstitutionDay_lead2', 'hol_portugal_perc_longwknd', 'gm_AVG_RSDNTL_AND_PHRMCY_PCNT_CHNG_FRM_BSLNE_MEAN', 'weather_wghtd_avg_tmp_flslk_2m_f_min_low', 'weather_presence_of_snow', 'hol_ImmaculateConception', 'hol_portugal_fri_mon_flag', 'ownSame_2_NET_PRICE_BAG', 'gm_AVG_RTL_AND_RCRTN_PCNT_CHNG_FRM_BSLNE_MEAN', 'hol_spain_fri_mon_flag', 'hol_NationalDay_LAG4', 'Dollar_Discount', 'hol_portugal_perc_wknd', 'weather_wghtd_avg_tmp_flslk_2m_f_min_mid', 'hol_NationalDay_LAG3', 'ownSame_1_NET_PRICE_BAG', 'hol_portugal_hol_flag', 'cg_STRNGNCY_INDX_MEAN', 'hol_ConstitutionDay', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_low', 'hol_Epiphany_lead4', 'cg_CNTNMT_HLTH_INDX_MEAN', 'cc_AVG_CFR_MEAN', 'hol_spain_perc_longwknd', 'weather_wghtd_avg_tmp_flslk_2m_f_max_mid', 'cg_GOVT_RSPNS_INDX_MEAN', 'inp_projected_inventory', 'ph_Unemployment_persons', 'ownSame_4_NET_PRICE_BAG', 'cc_REC_MEAN', 'inp_actual_inventory_flag', 'hol_NationalDay_LAG2', 'hol_AssumptionDay_lead1', 'hol_GoodFriday', 'hol_CorpusChristi_LAG4', 'hol_spain_wknd_flag', 'hol_portugal_wknd_flag', 'weather_wghtd_avg_cld_cvr_tot_pct_max_mid', 'weather_wghtd_avg_cld_cvr_tot_pct_max_low', 'weather_wghtd_avg_cld_cvr_tot_pct_max_high', 'media_total_spend_lag4']}, lagged_regressors=None, seasonality={'yearly_seasonality': 'auto', 'quarterly_seasonality': 'auto', 'monthly_seasonality': 'auto', 'weekly_seasonality': False, 'daily_seasonality': False}, uncertainty=None)

    When checking the model output summary, I only find 65 regressors in the regressor_features category. Of the missing 4, 3 have ended up in the trend_features category, and 1 in the lag_features category:

    summary = result.model[-1].summary()
    for key, val in summary.pred_category.items():
        print(key)
        print(len(val))
        print(val)
    

    intercept 1 ['Intercept'] time_features 0 [] event_features 0 [] trend_features 3 ['weather_wghtd_avg_cld_cvr_tot_pct_max_mid', 'weather_wghtd_avg_cld_cvr_tot_pct_max_low', 'weather_wghtd_avg_cld_cvr_tot_pct_max_high'] seasonality_features 44 ['sin1_tom_monthly', 'cos1_tom_monthly', 'sin2_tom_monthly', 'cos2_tom_monthly', 'sin1_toq_quarterly', 'cos1_toq_quarterly', 'sin2_toq_quarterly', 'cos2_toq_quarterly', 'sin3_toq_quarterly', 'cos3_toq_quarterly', 'sin4_toq_quarterly', 'cos4_toq_quarterly', 'sin5_toq_quarterly', 'cos5_toq_quarterly', 'sin1_ct1_yearly', 'cos1_ct1_yearly', 'sin2_ct1_yearly', 'cos2_ct1_yearly', 'sin3_ct1_yearly', 'cos3_ct1_yearly', 'sin4_ct1_yearly', 'cos4_ct1_yearly', 'sin5_ct1_yearly', 'cos5_ct1_yearly', 'sin6_ct1_yearly', 'cos6_ct1_yearly', 'sin7_ct1_yearly', 'cos7_ct1_yearly', 'sin8_ct1_yearly', 'cos8_ct1_yearly', 'sin9_ct1_yearly', 'cos9_ct1_yearly', 'sin10_ct1_yearly', 'cos10_ct1_yearly', 'sin11_ct1_yearly', 'cos11_ct1_yearly', 'sin12_ct1_yearly', 'cos12_ct1_yearly', 'sin13_ct1_yearly', 'cos13_ct1_yearly', 'sin14_ct1_yearly', 'cos14_ct1_yearly', 'sin15_ct1_yearly', 'cos15_ct1_yearly'] lag_features 1 ['media_total_spend_lag4'] regressor_features 65 ['ownSame_5_NET_PRICE_BAG', 'media_total_spend', 'Discount_Depth', 'weather_wghtd_avg_tmp_flslk_2m_f_max_low', 'ownSame_3_NET_PRICE_BAG', 'media_tv_traditional', 'hol_EasterSunday', 'cg_STRNGNCY_LGCY_INDX_MEAN', 'hol_RestorationofIndependence_LAG1', 'NET_PRICE_BAG', 'hol_AllSaintsDay_LAG3', 'hol_holiday_count', 'hol_AssumptionDay_lead4', 'hol_AssumptionDay_lead2', 'cc_AVG_IR_MEAN', 'hol_spain_perc_wknd', 'media_digital', 'hol_CorpusChristi_LAG2', 'hol_AssumptionDay_lead3', 'hol_CorpusChristi_LAG3', 'inp_actual_inventory', 'weather_wghtd_avg_tmp_flslk_2m_f_min_high', 'gm_AVG_PRKS_PCNT_CHNG_FRM_BSLNE_MEAN', 'weather_wghtd_avg_tmp_flslk_2m_f_max_high', 'hol_holiday_count_lead4', 'hol_spain_hol_flag', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_mid', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_high', 'hol_ConstitutionDay_lead2', 'hol_portugal_perc_longwknd', 'gm_AVG_RSDNTL_AND_PHRMCY_PCNT_CHNG_FRM_BSLNE_MEAN', 'weather_wghtd_avg_tmp_flslk_2m_f_min_low', 'weather_presence_of_snow', 'hol_ImmaculateConception', 'hol_portugal_fri_mon_flag', 'ownSame_2_NET_PRICE_BAG', 'gm_AVG_RTL_AND_RCRTN_PCNT_CHNG_FRM_BSLNE_MEAN', 'hol_spain_fri_mon_flag', 'hol_NationalDay_LAG4', 'Dollar_Discount', 'hol_portugal_perc_wknd', 'weather_wghtd_avg_tmp_flslk_2m_f_min_mid', 'hol_NationalDay_LAG3', 'ownSame_1_NET_PRICE_BAG', 'hol_portugal_hol_flag', 'cg_STRNGNCY_INDX_MEAN', 'hol_ConstitutionDay', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_low', 'hol_Epiphany_lead4', 'cg_CNTNMT_HLTH_INDX_MEAN', 'cc_AVG_CFR_MEAN', 'hol_spain_perc_longwknd', 'weather_wghtd_avg_tmp_flslk_2m_f_max_mid', 'cg_GOVT_RSPNS_INDX_MEAN', 'inp_projected_inventory', 'ph_Unemployment_persons', 'ownSame_4_NET_PRICE_BAG', 'cc_REC_MEAN', 'inp_actual_inventory_flag', 'hol_NationalDay_LAG2', 'hol_AssumptionDay_lead1', 'hol_GoodFriday', 'hol_CorpusChristi_LAG4', 'hol_spain_wknd_flag', 'hol_portugal_wknd_flag'] interaction_features 0 []

    Best, Dario

    opened by dromare 4
  • Setting of

    Setting of "cv_max_splits" when using "fit_algorithm": "lasso"

    Hi all,

    When setting fit_algorithm_params={"cv": 5} to use 5-fold CV with sklearn LassoCV() on the training set, how should the global parameter "cv_max_splits" be set up ? (either set it to zero, or to None - equivalent to 3 - or equal to 5 ?).

    Best regards, Dario

    opened by dromare 4
  • Getting Various Warnings while running time series prediction

    Getting Various Warnings while running time series prediction

    • I'm trying to fit GreyKite Model to my time series data.

    • I have attached the csv file for reference.

    • Even though the model works, it raises a bunch of warnings that I'd like to avoid.

    • Since some of my target values are zero it tells me that MAPE is undefined.

    • Also, since I'm only forecasting one step into the future, it gives me an UndefinedMetricWarning : R^2 score is not well-defined with less than two samples.'

    • I have attached a few images displaying the warnings.

    • Any help to get rid of these warnings would be appreciated!

    • This is the code I'm using to fit the data:

    `class GreyKiteModel(AnomalyModel):

    def __init__(self, *args,model_kwargs = {}, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.model_kwargs = model_kwargs
        
    def predict(self, df: pd.DataFrame, ) -> pd.DataFrame:
        """Takes in pd.DataFrame with 2 columns, dt and y, and returns a 
        pd.DataFrame with 3 columns, dt, y, and yhat_lower, yhat_upper.
    
        :param df: Input Dataframe with dt, y columns
        :type df: pd.DataFrame
        :return: Output Dataframe with dt, y, yhat_lower, yhat_upper 
        columns
        :rtype: pd.DataFrame
        """
        df = df.rename(columns = {"dt":"ds", "y":"y"})
        metadata = MetadataParam(time_col="ds", # ----> name of the time column 
                                 value_col="y", # ----> name of the value column 
                                 freq="D"       # ----> H" for hourly, "D" for daily, "W" for weekly, etc. 
                                )
        forecaster = Forecaster()  # Creates forecasts and stores the result
        result = forecaster.run_forecast_config(df=df, # result is also stored as forecaster.forecast_result
                                                config=ForecastConfig(model_template=ModelTemplateEnum.SILVERKITE.name,
                                                                      forecast_horizon=1,  # forecasts 1 step
                                                                      coverage=0.95,
                                                                      metadata_param=metadata 
                                                                      )
                                                )
        forecast_df = result.forecast.df
        forecast_df = forecast_df.drop(columns=['actual'])
        forecast_df.rename(columns={'ds':'dt',
                                    'forecast':'y', 
                                    'forecast_lower':'yhat_lower', 
                                    'forecast_upper':'yhat_upper' },
                           inplace=True)
        return forecast_df`
    

    df.csv

    Screenshot from 2021-08-21 12-39-55

    Screenshot from 2021-08-21 12-39-10

    opened by Amatullah 4
  • Load Model from a GCP Cloud Function

    Load Model from a GCP Cloud Function

    I'm trying to deploy my greykite model on GCP via a cloud function. The existing read and write functions only work for local directories and not cloud blob storage options. I've adjusted the write function to write to cloud storage but the load function is proving to be a bit challenging.

    opened by kenzie-q 4
  • Enforcing positive output, or how to tell the model not to produce negative results

    Enforcing positive output, or how to tell the model not to produce negative results

    I am working with people count data, they are clearly positive integers, including 0. How can GreyKite support >=0 data? In other words, is it possible to tell the model that negative values are forbidden by adding a very large penalty in the cost function?

    opened by CarloNicolini 1
  • ResourceWarning: Unclosed file... messages when deserializing a model.

    ResourceWarning: Unclosed file... messages when deserializing a model.

    The load_obj in the pickle_utils.py file makes lots of use of this pattern: dill.load(open(os.path.join(dir_name, file), 'rb')) And since dill does not close the file handler passed to the load method, it causes a Resource Warning for each of the parts of the model being loaded. In the following case, it caused 2856 of those warnings flooding the logs.

    c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:313: ResourceWarning:
    2022-08-20T16:56:02.4762240Z 
    2022-08-20T16:56:02.4762597Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4.type'>
    2022-08-20T16:56:02.4762842Z 
    2022-08-20T16:56:02.4763280Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:313: ResourceWarning:
    2022-08-20T16:56:02.4764328Z 
    2022-08-20T16:56:02.4764749Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\df_dropna__value__.type'>
    2022-08-20T16:56:02.4765023Z 
    2022-08-20T16:56:02.4765471Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:313: ResourceWarning:
    2022-08-20T16:56:02.4765803Z 
    2022-08-20T16:56:02.4766176Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\df__value__.type'>
    2022-08-20T16:56:02.4766440Z 
    2022-08-20T16:56:02.4766859Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:313: ResourceWarning:
    2022-08-20T16:56:02.4767206Z 
    2022-08-20T16:56:02.4767571Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\features_df__value__.type'>
    2022-08-20T16:56:02.4767866Z 
    2022-08-20T16:56:02.4768299Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:313: ResourceWarning:
    2022-08-20T16:56:02.4768626Z 
    2022-08-20T16:56:02.4769008Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\fitted_df__value__.type'>
    2022-08-20T16:56:02.4769279Z 
    2022-08-20T16:56:02.4769714Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:313: ResourceWarning:
    2022-08-20T16:56:02.4770043Z 
    2022-08-20T16:56:02.4770431Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\fs_components_df__value__.type'>
    2022-08-20T16:56:02.4770714Z 
    2022-08-20T16:56:02.4771148Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:313: ResourceWarning:
    2022-08-20T16:56:02.4771476Z 
    2022-08-20T16:56:02.4771863Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\ml_model_summary__value__.type'>
    2022-08-20T16:56:02.4772143Z 
    2022-08-20T16:56:02.4772588Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:313: ResourceWarning:
    2022-08-20T16:56:02.4772917Z 
    2022-08-20T16:56:02.4773290Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\uncertainty_model__value__.type'>
    2022-08-20T16:56:02.4773589Z 
    2022-08-20T16:56:02.4774004Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:313: ResourceWarning:
    2022-08-20T16:56:02.4774354Z 
    2022-08-20T16:56:02.4774721Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\x_design_info__value__.type'>
    2022-08-20T16:56:02.4775010Z 
    2022-08-20T16:56:02.4775425Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:313: ResourceWarning:
    2022-08-20T16:56:02.4776102Z 
    2022-08-20T16:56:02.4776491Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\x_mat__value__.type'>
    2022-08-20T16:56:02.4776755Z 
    2022-08-20T16:56:02.4777188Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:376: ResourceWarning:
    2022-08-20T16:56:02.4777512Z 
    2022-08-20T16:56:02.4777900Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\adjust_anomalous_info__key__.pkl'>
    2022-08-20T16:56:02.4778437Z 
    2022-08-20T16:56:02.4778891Z c:\windows\serviceprofiles\networkservice\.conda\envs\aml_test_38\lib\site-packages\greykite\framework\templates\pickle_utils.py:395: ResourceWarning:
    2022-08-20T16:56:02.4779216Z 
    2022-08-20T16:56:02.4779615Z unclosed file <_io.BufferedReader name='C:\\Windows\\SERVIC~2\\NETWOR~1\\AppData\\Local\\Temp\\tmp55hy74nb\\tmp7q08pus4\\adjust_anomalous_info__value__.pkl'>
    ...
    

    We were using graykite 0.2.0, but I see the issue is still present in version 0.4.0.

    If instead we had a simple read_pickle or dill_load function that would handle closing the file:

    def read_pickle(path, mode):
        """Loads the pickled files and closes the file handle.
    
        Parameters
        ----------
        path : `str`
            The path to the pickled file.
        mode : `str`
            The mode of the open function.
        """
        with open(path, mode) as file:
            data = dill.load(file)
        return data
    

    It would be just a question of changing the calls to dill.load(open(...)) to read_pickle(...). Are there any reason not to use this pattern?

    opened by briantani 1
  • Forecasts appear too smooth for dataset original variability

    Forecasts appear too smooth for dataset original variability

    I have some time series forecasts of weekly data where forecasts are too smooth or not really pronounced (not sure how to describe it). Another time series however, for the same dates, same general value range, same model parameters, shows "correct" variability. Both datasets show the same warnings when run (generally about daily event timestamps due to weekly data and a warning that data is highly fragmented)

    In another dataset (I have not currently access to), removing the last entry from the time-series fixed the variability. This does however not work for the dataset1 below.

    Is this a bug or intended behaviour? If this is intended, which parameters would I have to tweak to mitigate this? Thx for your help

    Dataset1: relatively smooth forcast (False) False

    Dataset2: correct variability forecast (True) True

    I am using the following settings for the model (Greykite v.0.4.0):

    - ForecastConfig(model_template=ModelTemplateEnum.SILVERKITE.name,  forecast_horizon=16, coverage=0.95)
    - metadata = MetadataParam(anomaly_info=None, date_format=None, freq='W', time_col='date', train_end_date=None, value_col='value')
    - model_components = 
    { 'autoregression': None,
        'changepoints': {
            'changepoints_dict': {
                'method': 'auto',
                'yearly_seasonality_order': 15,
                'regularization_strength': 0.5,
                'resample_freq': '7D',
                'potential_changepoint_n': 25,
                'no_changepoint_proportion_from_end': 0.2
            }
        },
        'custom': {
            'fit_algorithm_dict': {
                'fit_algorithm': 'ridge'
            }
        },
        'events': {
            'holidays_to_model_separately': 'ALL_HOLIDAYS_IN_COUNTRIES',
            'holiday_lookup_countries': ['DE'],
            'holiday_pre_num_days': 4,
            'holiday_post_num_days': 1,
            'holiday_pre_post_num_dict': None,
            'daily_event_df_dict': {
                'Mothersday': date event_name
                0 2017-05-14 Mothersday
                1 2018-05-13 Mothersday
                2 2019-05-12 Mothersday
                3 2020-05-10 Mothersday
                4 2021-05-09 Mothersday
                5 2022-05-08 Mothersday
                6 2023-05-14 Mothersday
                7 2024-05-12 Mothersday
                8 2025-05-11 Mothersday,
                'Black Friday': date event_name
                0 2017-11-24 Black Friday
                1 2018-11-23 Black Friday
                2 2019-11-29 Black Friday
                3 2020-11-27 Black Friday
                4 2021-11-26 Black Friday
                5 2022-11-25 Black Friday
                6 2023-11-24 Black Friday
                7 2024-11-29 Black Friday
                8 2025-11-29 Black Friday
            }
        },
        'growth': None,
        'hyperparameter_override': None,
        'regressors': None,
        'lagged_regressors': None,
        'seasonality': None,
        'uncertainty': None
    }
    

    DataSet1 (False):

    df1 = pd.DataFrame.from_dict({'date':{0:'2017-12-31',1:'2018-01-07',2:'2018-01-14',3:'2018-01-21',4:'2018-01-28',5:'2018-02-04',6:'2018-02-11',7:'2018-02-18',8:'2018-02-25',9:'2018-03-04',10:'2018-03-11',11:'2018-03-18',12:'2018-03-25',13:'2018-04-01',14:'2018-04-08',15:'2018-04-15',16:'2018-04-22',17:'2018-04-29',18:'2018-05-06',19:'2018-05-13',20:'2018-05-20',21:'2018-05-27',22:'2018-06-03',23:'2018-06-10',24:'2018-06-17',25:'2018-06-24',26:'2018-07-01',27:'2018-07-08',28:'2018-07-15',29:'2018-07-22',30:'2018-07-29',31:'2018-08-05',32:'2018-08-12',33:'2018-08-19',34:'2018-08-26',35:'2018-09-02',36:'2018-09-09',37:'2018-09-16',38:'2018-09-23',39:'2018-09-30',40:'2018-10-07',41:'2018-10-14',42:'2018-10-21',43:'2018-10-28',44:'2018-11-04',45:'2018-11-11',46:'2018-11-18',47:'2018-11-25',48:'2018-12-02',49:'2018-12-09',50:'2018-12-16',51:'2018-12-23',52:'2018-12-30',53:'2019-01-06',54:'2019-01-13',55:'2019-01-20',56:'2019-01-27',57:'2019-02-03',58:'2019-02-10',59:'2019-02-17',60:'2019-02-24',61:'2019-03-03',62:'2019-03-10',63:'2019-03-17',64:'2019-03-24',65:'2019-03-31',66:'2019-04-07',67:'2019-04-14',68:'2019-04-21',69:'2019-04-28',70:'2019-05-05',71:'2019-05-12',72:'2019-05-19',73:'2019-05-26',74:'2019-06-02',75:'2019-06-09',76:'2019-06-16',77:'2019-06-23',78:'2019-06-30',79:'2019-07-07',80:'2019-07-14',81:'2019-07-21',82:'2019-07-28',83:'2019-08-04',84:'2019-08-11',85:'2019-08-18',86:'2019-08-25',87:'2019-09-01',88:'2019-09-08',89:'2019-09-15',90:'2019-09-22',91:'2019-09-29',92:'2019-10-06',93:'2019-10-13',94:'2019-10-20',95:'2019-10-27',96:'2019-11-03',97:'2019-11-10',98:'2019-11-17',99:'2019-11-24',100:'2019-12-01',101:'2019-12-08',102:'2019-12-15',103:'2019-12-22',104:'2019-12-29',105:'2020-01-05',106:'2020-01-12',107:'2020-01-19',108:'2020-01-26',109:'2020-02-02',110:'2020-02-09',111:'2020-02-16',112:'2020-02-23',113:'2020-03-01',114:'2020-03-08',115:'2020-03-15',116:'2020-03-22',117:'2020-03-29',118:'2020-04-05',119:'2020-04-12',120:'2020-04-19',121:'2020-04-26',122:'2020-05-03',123:'2020-05-10',124:'2020-05-17',125:'2020-05-24',126:'2020-05-31',127:'2020-06-07',128:'2020-06-14',129:'2020-06-21',130:'2020-06-28',131:'2020-07-05',132:'2020-07-12',133:'2020-07-19',134:'2020-07-26',135:'2020-08-02',136:'2020-08-09',137:'2020-08-16',138:'2020-08-23',139:'2020-08-30',140:'2020-09-06',141:'2020-09-13',142:'2020-09-20',143:'2020-09-27',144:'2020-10-04',145:'2020-10-11',146:'2020-10-18',147:'2020-10-25',148:'2020-11-01',149:'2020-11-08',150:'2020-11-15',151:'2020-11-22',152:'2020-11-29',153:'2020-12-06',154:'2020-12-13',155:'2020-12-20',156:'2020-12-27',157:'2021-01-03',158:'2021-01-10',159:'2021-01-17',160:'2021-01-24',161:'2021-01-31',162:'2021-02-07',163:'2021-02-14',164:'2021-02-21',165:'2021-02-28',166:'2021-03-07',167:'2021-03-14',168:'2021-03-21',169:'2021-03-28',170:'2021-04-04',171:'2021-04-11',172:'2021-04-18',173:'2021-04-25',174:'2021-05-02',175:'2021-05-09',176:'2021-05-16',177:'2021-05-23',178:'2021-05-30',179:'2021-06-06',180:'2021-06-13',181:'2021-06-20',182:'2021-06-27',183:'2021-07-04',184:'2021-07-11',185:'2021-07-18',186:'2021-07-25',187:'2021-08-01',188:'2021-08-08',189:'2021-08-15',190:'2021-08-22',191:'2021-08-29',192:'2021-09-05',193:'2021-09-12',194:'2021-09-19',195:'2021-09-26',196:'2021-10-03',197:'2021-10-10',198:'2021-10-17',199:'2021-10-24',200:'2021-10-31',201:'2021-11-07',202:'2021-11-14',203:'2021-11-21',204:'2021-11-28',205:'2021-12-05',206:'2021-12-12',207:'2021-12-19',208:'2021-12-26'},'value':{0:8057,1:8543,2:7500,3:8090,4:9858,5:9869,6:8414,7:7599,8:8805,9:8367,10:7650,11:7360,12:8205,13:7547,14:6710,15:5767,16:6902,17:7093,18:6013,19:6337,20:5172,21:6260,22:6120,23:6551,24:6363,25:6785,26:6675,27:6586,28:5872,29:5460,30:6195,31:5756,32:5856,33:5477,34:6472,35:6183,36:5732,37:5356,38:6440,39:6531,40:6473,41:6177,42:6343,43:7956,44:6439,45:6745,46:6628,47:8026,48:6859,49:6915,50:9287,51:7145,52:6082,53:5345,54:5546,55:5574,56:7117,57:6770,58:6660,59:6140,60:7521,61:6136,62:5511,63:5304,64:6010,65:5868,66:5615,67:5232,68:5202,69:5820,70:5514,71:4909,72:4558,73:5287,74:5342,75:4692,76:4599,77:4877,78:5545,79:5871,80:5430,81:4465,82:5759,83:5225,84:4832,85:4486,86:4818,87:5314,88:5070,89:4873,90:6263,91:7893,92:6766,93:7463,94:8210,95:8325,96:6934,97:6408,98:5899,99:7113,100:6919,101:7253,102:8818,103:7302,104:7541,105:6010,106:6122,107:5899,108:6794,109:6915,110:7089,111:7788,112:6756,113:6683,114:5414,115:2699,116:2075,117:2972,118:3004,119:3290,120:3620,121:3467,122:4297,123:4053,124:3803,125:4732,126:4120,127:4465,128:5465,129:4656,130:5495,131:5204,132:5005,133:4581,134:4761,135:4708,136:4375,137:4603,138:5127,139:5383,140:5162,141:5345,142:5358,143:5867,144:6885,145:6122,146:5395,147:5954,148:5318,149:5175,150:5152,151:6669,152:6654,153:5248,154:5436,155:3814,156:5276,157:4573,158:4096,159:4079,160:4574,161:4583,162:4201,163:4592,164:4880,165:5087,166:5116,167:5461,168:5337,169:5321,170:4897,171:4876,172:4756,173:4283,174:6367,175:5509,176:5856,177:5429,178:5338,179:4899,180:4094,181:4936,182:5940,183:5737,184:5471,185:5131,186:5799,187:5813,188:5209,189:5141,190:5408,191:5957,192:5442,193:5692,194:5408,195:6587,196:6365,197:6173,198:5907,199:8504,200:6358,201:7015,202:6714,203:6263,204:6826,205:6296,206:6818,207:6688,208:6895}})
    

    Dataset2 (True):

    df2 = pd.DataFrame.from_dict({'date':{0:'2017-12-31',1:'2018-01-07',2:'2018-01-14',3:'2018-01-21',4:'2018-01-28',5:'2018-02-04',6:'2018-02-11',7:'2018-02-18',8:'2018-02-25',9:'2018-03-04',10:'2018-03-11',11:'2018-03-18',12:'2018-03-25',13:'2018-04-01',14:'2018-04-08',15:'2018-04-15',16:'2018-04-22',17:'2018-04-29',18:'2018-05-06',19:'2018-05-13',20:'2018-05-20',21:'2018-05-27',22:'2018-06-03',23:'2018-06-10',24:'2018-06-17',25:'2018-06-24',26:'2018-07-01',27:'2018-07-08',28:'2018-07-15',29:'2018-07-22',30:'2018-07-29',31:'2018-08-05',32:'2018-08-12',33:'2018-08-19',34:'2018-08-26',35:'2018-09-02',36:'2018-09-09',37:'2018-09-16',38:'2018-09-23',39:'2018-09-30',40:'2018-10-07',41:'2018-10-14',42:'2018-10-21',43:'2018-10-28',44:'2018-11-04',45:'2018-11-11',46:'2018-11-18',47:'2018-11-25',48:'2018-12-02',49:'2018-12-09',50:'2018-12-16',51:'2018-12-23',52:'2018-12-30',53:'2019-01-06',54:'2019-01-13',55:'2019-01-20',56:'2019-01-27',57:'2019-02-03',58:'2019-02-10',59:'2019-02-17',60:'2019-02-24',61:'2019-03-03',62:'2019-03-10',63:'2019-03-17',64:'2019-03-24',65:'2019-03-31',66:'2019-04-07',67:'2019-04-14',68:'2019-04-21',69:'2019-04-28',70:'2019-05-05',71:'2019-05-12',72:'2019-05-19',73:'2019-05-26',74:'2019-06-02',75:'2019-06-09',76:'2019-06-16',77:'2019-06-23',78:'2019-06-30',79:'2019-07-07',80:'2019-07-14',81:'2019-07-21',82:'2019-07-28',83:'2019-08-04',84:'2019-08-11',85:'2019-08-18',86:'2019-08-25',87:'2019-09-01',88:'2019-09-08',89:'2019-09-15',90:'2019-09-22',91:'2019-09-29',92:'2019-10-06',93:'2019-10-13',94:'2019-10-20',95:'2019-10-27',96:'2019-11-03',97:'2019-11-10',98:'2019-11-17',99:'2019-11-24',100:'2019-12-01',101:'2019-12-08',102:'2019-12-15',103:'2019-12-22',104:'2019-12-29',105:'2020-01-05',106:'2020-01-12',107:'2020-01-19',108:'2020-01-26',109:'2020-02-02',110:'2020-02-09',111:'2020-02-16',112:'2020-02-23',113:'2020-03-01',114:'2020-03-08',115:'2020-03-15',116:'2020-03-22',117:'2020-03-29',118:'2020-04-05',119:'2020-04-12',120:'2020-04-19',121:'2020-04-26',122:'2020-05-03',123:'2020-05-10',124:'2020-05-17',125:'2020-05-24',126:'2020-05-31',127:'2020-06-07',128:'2020-06-14',129:'2020-06-21',130:'2020-06-28',131:'2020-07-05',132:'2020-07-12',133:'2020-07-19',134:'2020-07-26',135:'2020-08-02',136:'2020-08-09',137:'2020-08-16',138:'2020-08-23',139:'2020-08-30',140:'2020-09-06',141:'2020-09-13',142:'2020-09-20',143:'2020-09-27',144:'2020-10-04',145:'2020-10-11',146:'2020-10-18',147:'2020-10-25',148:'2020-11-01',149:'2020-11-08',150:'2020-11-15',151:'2020-11-22',152:'2020-11-29',153:'2020-12-06',154:'2020-12-13',155:'2020-12-20',156:'2020-12-27',157:'2021-01-03',158:'2021-01-10',159:'2021-01-17',160:'2021-01-24',161:'2021-01-31',162:'2021-02-07',163:'2021-02-14',164:'2021-02-21',165:'2021-02-28',166:'2021-03-07',167:'2021-03-14',168:'2021-03-21',169:'2021-03-28',170:'2021-04-04',171:'2021-04-11',172:'2021-04-18',173:'2021-04-25',174:'2021-05-02',175:'2021-05-09',176:'2021-05-16',177:'2021-05-23',178:'2021-05-30',179:'2021-06-06',180:'2021-06-13',181:'2021-06-20',182:'2021-06-27',183:'2021-07-04',184:'2021-07-11',185:'2021-07-18',186:'2021-07-25',187:'2021-08-01',188:'2021-08-08',189:'2021-08-15',190:'2021-08-22',191:'2021-08-29',192:'2021-09-05',193:'2021-09-12',194:'2021-09-19',195:'2021-09-26',196:'2021-10-03',197:'2021-10-10',198:'2021-10-17',199:'2021-10-24',200:'2021-10-31',201:'2021-11-07',202:'2021-11-14',203:'2021-11-21',204:'2021-11-28',205:'2021-12-05',206:'2021-12-12',207:'2021-12-19',208:'2021-12-26'},'value':{0:5772,1:5902,2:5820,3:5802,4:5860,5:6005,6:6052,7:6244,8:6600,9:6253,10:6099,11:6167,12:6011,13:5844,14:5956,15:5813,16:5861,17:5870,18:5877,19:5834,20:5721,21:5885,22:6013,23:5933,24:5902,25:5994,26:6157,27:6048,28:6020,29:6041,30:6103,31:5994,32:5923,33:5881,34:5934,35:5871,36:5856,37:5818,38:6010,39:5884,40:5815,41:5816,42:5880,43:5897,44:5869,45:5851,46:6007,47:5973,48:5825,49:5913,50:6136,51:5647,52:5700,53:5731,54:5681,55:5788,56:5838,57:5740,58:5665,59:5645,60:5876,61:5799,62:5749,63:5694,64:5741,65:5724,66:5848,67:5875,68:5800,69:5851,70:5941,71:5816,72:5683,73:5731,74:5839,75:5705,76:5680,77:5775,78:5866,79:5810,80:5680,81:5606,82:5709,83:5677,84:5661,85:5616,86:5633,87:5710,88:5638,89:5626,90:5612,91:5654,92:5696,93:5670,94:5625,95:5682,96:5698,97:5774,98:5734,99:5947,100:6050,101:5925,102:5956,103:5571,104:5652,105:5656,106:5642,107:5649,108:5881,109:5814,110:5774,111:5759,112:5928,113:5866,114:5973,115:5968,116:5869,117:6165,118:5875,119:5749,120:5779,121:5747,122:5876,123:5892,124:5779,125:5812,126:5742,127:5625,128:5578,129:5576,130:5589,131:5545,132:5545,133:5554,134:5553,135:5551,136:5480,137:5509,138:5610,139:5611,140:5612,141:5710,142:5839,143:5795,144:6008,145:5918,146:5840,147:5844,148:5905,149:5844,150:5818,151:5853,152:6161,153:6385,154:6303,155:5681,156:5729,157:5898,158:5873,159:5856,160:5899,161:5910,162:5926,163:6254,164:6107,165:6303,166:6229,167:6172,168:6122,169:6086,170:6394,171:6324,172:6182,173:6118,174:6351,175:6015,176:6027,177:5896,178:6061,179:6015,180:5946,181:5897,182:5913,183:5829,184:5770,185:5776,186:5865,187:5863,188:5818,189:5817,190:5811,191:5839,192:5811,193:5793,194:5792,195:6198,196:6344,197:5433,198:6192,199:6293,200:6034,201:6281,202:6138,203:6147,204:6359,205:6381,206:6485,207:6380,208:6006}})
    
    opened by jdegene 1
  • No seasonal terms are included with seasonality options set to 'auto' with monthly data

    No seasonal terms are included with seasonality options set to 'auto' with monthly data

    Greykite documentation states that the seasonality "auto" option is meant to let the template decide, based on input data frequency and the amount of training data, whether to model that seasonality with default Fourier order: https://linkedin.github.io/greykite/docs/0.1.0/html/pages/model_components/0300_seasonality.html?highlight=seasonality

    However, with monthly data, this option always defaults to False, both for QUARTERLY_SEASONALITY and YEARLY_SEASONALITY, even when the amount of training data (num_training_days) is greater than the minimum required (default_min_days). Why ? Read below.

    These are the Silverkite default settings for minimum training data requirements, as defined in \greykite\algo\forecast\silverkite\constants\silverkite_seasonality.py

    SilverkiteSeasonality(name='ct1', period=1.0, order=15, seas_names='yearly', default_min_days=548)
    SilverkiteSeasonality(name='toq', period=1.0, order=5, seas_names='quarterly', default_min_days=180)
    

    num_training_days is calculated in \greykite\common\time_properties_forecast.py, whereas the actual test is in \greykite\algo\forecast\silverkite\forecast_simple_silverkite.py(here, num_days is num_training_days calculated above):

    num_days >= seas.value.default_min_days
                        and seas.name in freq_auto_seas_names
    

    The result of the test is always False for monthly data, because freq_auto_seas_names is an empty dictionary, hence the condition seas.name in freq_auto_seas_names is never met ; the reason can be clearly seen in \greykite\algo\forecast\silverkite\constants\silverkite_time_frequency.py, where, e.g., for weekly data freq_auto_seas_names is the following dictionary:

    auto_fourier_seas={SeasonalityEnum.MONTHLY_SEASONALITY.name,
                               SeasonalityEnum.QUARTERLY_SEASONALITY.name,
                               SeasonalityEnum.YEARLY_SEASONALITY.name})
    

    whereas for monthly, quarterly and yearly data freq_auto_seas_names = {}, e.g. for monthly data:

    auto_fourier_seas={
                # QUARTERLY_SEASONALITY and YEARLY_SEASONALITY are excluded from defaults
                # It's better to use `C(month)` as a categorical feature indicating the month
            })
    

    Therefore, based on input data frequency in the first line of this issue really means: if the data frequency is one of MINUTE, HOUR, DAY, WEEK, excluding MONTH, QUARTER, YEAR, MULTIYEAR.

    The "better" option in \greykite\algo\forecast\silverkite\constants\silverkite_time_frequency.py when using monthly data is thus to add an extra C(month) column as a categorical feature indicating the month.

    Question: Why is this a "better" option than the following definition ?

    auto_fourier_seas={SeasonalityEnum.QUARTERLY_SEASONALITY.name,
                               SeasonalityEnum.YEARLY_SEASONALITY.name})
    

    I see the following alternatives when dealing with monthly data:

    1. add an extra C(month) column as a categorical feature indicating the month; this has the disadvantage that the extra column should only be added when both QUARTERLY_SEASONALITY and YEARLY_SEASONALITY options are set to "auto" and not to "True" or "False" (quarterly and/or yearly seasonality terms are added automatically by Greykite when the respective option is set to "True", according to the valid_seas dictionary defined in _\greykite\common\enums.py; while the term in question is not added when "False")
    2. Add QUARTERLY_SEASONALITY and YEARLY_SEASONALITY terms (currently excluded from defaults) to the empty auto_fouries_seas dictionary; but Greykite developers seem to prefer option 1.
    3. Forget about the user setting the seasonality options ("auto", "True", "False") manually - this is applicable to all input data frequencies, not just monthly:
    • [ ] Let the user configure the Fourier order and the minimum number of cycles for each seasonality
    • [ ] Set the corresponding seasonality option to either "True" or "False" automatically, according to principles learned from the current logic, i.e., input data frequency, valid_seas and num_training_points >= default_min_points

    One may argue that num_training_points varies between training sets when using CV splits; however, the following example shows that both num_training_points and num_training_days are invariant between splits, even with cv_expanding_window =True:

    [CV 1/3] ... valid_seas={'YEARLY_SEASONALITY', 'QUARTERLY_SEASONALITY'})>, 'num_training_points': 26, 'num_training_days': 789.0, 'days_per_observation': 28.0, ...
    [CV 2/3] ... valid_seas={'YEARLY_SEASONALITY', 'QUARTERLY_SEASONALITY'})>, 'num_training_points': 26, 'num_training_days': 789.0, 'days_per_observation': 28.0, ...
    [CV 3/3] ... valid_seas={'YEARLY_SEASONALITY', 'QUARTERLY_SEASONALITY'})>, 'num_training_points': 26, 'num_training_days': 789.0, 'days_per_observation': 28.0, ...
    

    This means that the test num_training_points >= default_min_points can be applied only once directly from train_end_date before entering the CV loop (the current Fitting 3 folds for each of 1 candidates, totalling 3 fits section apparently tests the seasonality terms at each split, but the test values are invariant, as mentioned above).

    opened by dromare 0
  • Extract components from forecast

    Extract components from forecast

    Hi, I was wondering if it is possible to extract the different modeling components (e.g. trend, holidays, seasonalities) from the forecasted time series. It's possible to do this in the Prophet framework, see: https://github.com/facebook/prophet/issues/1920

    The reason is that I would like to use a custom trend component calculated outside of Greykite.

    opened by Jonathan-MW 4
Releases(v0.4.0)
  • v0.4.0(Jul 21, 2022)

    PyPI: https://pypi.org/project/greykite/0.4.0/ Release notes: https://github.com/linkedin/greykite/blob/master/HISTORY.rst Contributors: @KaixuYang, @Reza1317, @sayanpatra, @njusu, @al-bert

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Dec 22, 2021)

    PyPI: https://pypi.org/project/greykite/0.3.0/ Release notes: https://github.com/linkedin/greykite/blob/master/HISTORY.rst Contributors: @KaixuYang , @njusu , @Reza1317 , @al-bert , @sayanpatra , @Saadorj , @dromare , @martinmenchon

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Dec 22, 2021)

    PyPI: https://pypi.org/project/greykite/0.2.0/ Release notes: https://github.com/linkedin/greykite/blob/master/HISTORY.rst Contributors: @KaixuYang , @sayanpatra , @Reza1317 , @Saadorj

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Dec 22, 2021)

    PyPI: https://pypi.org/project/greykite/0.1.1/ Authors: @Reza1317 , @al-bert , @KaixuYang , @sayanpatra Other contributors: @Saadorj, @rachitb1

    Blog post for this release: https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive--and-fast-forecasting-library Paper for this release: https://arxiv.org/abs/2105.01098

    Source code(tar.gz)
    Source code(zip)
Owner
LinkedIn
LinkedIn
Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

Thomas J. Fan 6 Dec 27, 2022
Uber Open Source 1.6k Dec 31, 2022
Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Federal University of Rio Grande do Norte Technology Center Department of Computer Engineering and Automation Machine Learning Based Systems Design Re

Ivanovitch Silva 81 Oct 18, 2022
Python package for machine learning for healthcare using a OMOP common data model

This library was developed in order to facilitate rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database.

Sontag Lab 75 Jan 03, 2023
This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning. It is a Web Application.

Developer Junaid 3 Aug 04, 2022
A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

The Alan Turing Institute 6k Jan 06, 2023
SynapseML - an open source library to simplify the creation of scalable machine learning pipelines

Synapse Machine Learning SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. Sy

Microsoft 3.9k Dec 30, 2022
This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Interpretable Machine Learning with Python, published by Packt

Packt 299 Jan 02, 2023
Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Simple but maybe too simple config management through python data classes. We use it for machine learning.

Eren Gölge 67 Nov 29, 2022
Flightfare-Prediction - It is a Flightfare Prediction Web Application Using Machine learning,Python and flask

Flight_fare-Prediction It is a Flight_fare Prediction Web Application Using Machine learning,Python and flask Using Machine leaning i have created a F

1 Dec 06, 2022
This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

Martin Huber 59 Dec 09, 2022
机器学习检测webshell

ai-webshell-detect 机器学习检测webshell,利用textcnn+简单二分类网络,基于keras,花了七天 检测原理: 从文件熵 文件长度 文件语句提取出特征,然后文件熵与长度送入二分类网络,文件语句送入textcnn 项目原理,介绍,怎么做出来的

Huoji's 56 Dec 14, 2022
Kaggle Competition using 15 numerical predictors to predict a continuous outcome.

Kaggle-Comp.-Data-Mining Kaggle Competition using 15 numerical predictors to predict a continuous outcome as part of a final project for a stats data

moisey alaev 1 Dec 28, 2021
Cryptocurrency price prediction and exceptions in python

Cryptocurrency price prediction and exceptions in python This is a coursework on foundations of computing module Through this coursework i worked on m

Panagiotis Sotirellos 1 Nov 07, 2021
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

pandas-method-chaining pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code. It is a fork from pandas-v

Francis 5 May 14, 2022
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
Retrieve annotated intron sequences and classify them as minor (U12-type) or major (U2-type)

(intron I nterrogator and C lassifier) intronIC is a program that can be used to classify intron sequences as minor (U12-type) or major (U2-type), usi

Graham Larue 4 Jul 26, 2022
Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Mert Sezer Ardal 1 Jan 31, 2022
Coursera Machine Learning - Python code

Coursera Machine Learning This repository contains python implementations of certain exercises from the course by Andrew Ng. For a number of assignmen

Jordi Warmenhoven 859 Dec 10, 2022