Merlion: A Machine Learning Framework for Time Series Intelligence

Overview
Logo

Merlion: A Machine Learning Library for Time Series

Table of Contents

  1. Introduction
  2. Installation
  3. Documentation
  4. Getting Started
    1. Anomaly Detection
    2. Forecasting
  5. Evaluation and Benchmarking
  6. Technical Report and Citing Merlion

Introduction

Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processing model outputs, and evaluating model performance. It supports various time series learning tasks, including forecasting and anomaly detection for both univariate and multivariate time series. This library aims to provide engineers and researchers a one-stop solution to rapidly develop models for their specific time series needs, and benchmark them across multiple time series datasets.

Merlion's key features are

  • Standardized and easily extensible data loading & benchmarking for a wide range of forecasting and anomaly detection datasets.
  • A library of diverse models for both anomaly detection and forecasting, unified undera shared interface. Models include classic statistical methods, tree ensembles, and deeplearning approaches. Advanced users may fully configure each model as desired.
  • Abstract DefaultDetector and DefaultForecaster models that are efficient, robustly achieve good performance, and provide a starting point for new users.
  • AutoML for automated hyperaparameter tuning and model selection.
  • Practical, industry-inspired post-processing rules for anomaly detectors that make anomaly scores more interpretable, while also reducing the number of false positives.
  • Easy-to-use ensembles that combine the outputs of multiple models to achieve more robust performance.
  • Flexible evaluation pipelines that simulate the live deployment & re-training of a model in production, and evaluate performance on both forecasting and anomaly detection.
  • Native support for visualizing model predictions.

The table below provides a visual overview of how Merlion's key features compare to other libraries for time series anomaly detection and/or forecasting.

Merlion Alibi Detect Kats statsmodels GluonTS RRCF STUMPY Greykite Prophet pmdarima
Univariate Forecasting
Multivariate Forecasting
Univariate Anomaly Detection
Multivariate Anomaly Detection
AutoML
Ensembles
Benchmarking
Visualization

Installation

Merlion consists of two sub-repos: merlion implements the library's core time series intelligence features, and ts_datasets provides standardized data loaders for multiple time series datasets. These loaders load time series as pandas.DataFrame s with accompanying metadata.

You can install merlion from PyPI by calling pip install sfdc-merlion. You may install from source by cloning this repo, navigating to the root directory, and calling pip install ., or pip install -e . to install in editable mode. You may install additional dependencies for plotting & visualization via pip install sfdc-merlion[plot], or by calling pip install ".[plot]" from the root directory of this repo.

To install the data loading package ts_datasets, clone this repo, navigate to its root directory, and call pip install -e ts_datasets/. This package must be installed in editable mode (i.e. with the -e flag) if you don't want to manually specify the root directory of every dataset when initializing its data loader.

Note the following external dependencies:

  1. Some of our forecasting models depend on OpenMP. If using conda, please conda install -c conda-forge lightgbm before installing our package. This will ensure that OpenMP is configured to work with the lightgbm package (one of our dependencies) in your conda environment. If using Mac, please install Homebrew and call brew install libomp so that the OpenMP libary is available for the model.

  2. Some of our anomaly detection models depend on the Java Development Kit (JDK). For Ubuntu, call sudo apt-get install openjdk-11-jdk. For Mac OS, install Homebrew and call brew tap adoptopenjdk/openjdk && brew install --cask adoptopenjdk11.

Documentation

For example code and an introduction to Merlion, see the Jupyter notebooks in examples, and the guided walkthrough here. You may find detailed API documentation (including the example code) here. The technical report outlines Merlion's overall architecture and presents experimental results on time series anomaly detection & forecasting for both univariate and multivariate time series.

Getting Started

Here, we provide some minimal examples using Merlion default models, to help you get started with both anomaly detection and forecasting.

Anomaly Detection

We begin by importing Merlion’s TimeSeries class and the data loader for the Numenta Anomaly Benchmark NAB. We can then divide a specific time series from this dataset into training and testing splits.

from merlion.utils import TimeSeries
from ts_datasets.anomaly import NAB

# Data loader returns pandas DataFrames, which we convert to Merlion TimeSeries
time_series, metadata = NAB(subset="realKnownCause")[3]
train_data = TimeSeries.from_pd(time_series[metadata.trainval])
test_data = TimeSeries.from_pd(time_series[~metadata.trainval])
test_labels = TimeSeries.from_pd(metadata.anomaly[~metadata.trainval])

We can then initialize and train Merlion’s DefaultDetector, which is an anomaly detection model that balances performance with efficiency. We also obtain its predictions on the test split.

from merlion.models.defaults import DefaultDetectorConfig, DefaultDetector
model = DefaultDetector(DefaultDetectorConfig())
model.train(train_data=train_data)
test_pred = model.get_anomaly_label(time_series=test_data)

Next, we visualize the model's predictions.

from merlion.plot import plot_anoms
import matplotlib.pyplot as plt
fig, ax = model.plot_anomaly(time_series=test_data)
plot_anoms(ax=ax, anomaly_labels=test_labels)
plt.show()

anomaly figure

Finally, we can quantitatively evaluate the model. The precision and recall come from the fact that the model fired 3 alarms, with 2 true positives, 1 false negative, and 1 false positive. We also evaluate the mean time the model took to detect each anomaly that it correctly detected.

from merlion.evaluate.anomaly import TSADMetric
p = TSADMetric.Precision.value(ground_truth=test_labels, predict=test_pred)
r = TSADMetric.Recall.value(ground_truth=test_labels, predict=test_pred)
f1 = TSADMetric.F1.value(ground_truth=test_labels, predict=test_pred)
mttd = TSADMetric.MeanTimeToDetect.value(ground_truth=test_labels, predict=test_pred)
print(f"Precision: {p:.4f}, Recall: {r:.4f}, F1: {f1:.4f}\n"
      f"Mean Time To Detect: {mttd}")
Precision: 0.6667, Recall: 0.6667, F1: 0.6667
Mean Time To Detect: 1 days 10:30:00

Forecasting

We begin by importing Merlion’s TimeSeries class and the data loader for the M4 dataset. We can then divide a specific time series from this dataset into training and testing splits.

from merlion.utils import TimeSeries
from ts_datasets.forecast import M4

# Data loader returns pandas DataFrames, which we convert to Merlion TimeSeries
time_series, metadata = M4(subset="Hourly")[0]
train_data = TimeSeries.from_pd(time_series[metadata.trainval])
test_data = TimeSeries.from_pd(time_series[~metadata.trainval])

We can then initialize and train Merlion’s DefaultForecaster, which is an forecasting model that balances performance with efficiency. We also obtain its predictions on the test split.

from merlion.models.defaults import DefaultForecasterConfig, DefaultForecaster
model = DefaultForecaster(DefaultForecasterConfig())
model.train(train_data=train_data)
test_pred, test_err = model.forecast(time_stamps=test_data.time_stamps)

Next, we visualize the model’s predictions.

import matplotlib.pyplot as plt
fig, ax = model.plot_forecast(time_series=test_data, plot_forecast_uncertainty=True)
plt.show()

forecast figure

Finally, we quantitatively evaluate the model. sMAPE measures the error of the prediction on a scale of 0 to 100 (lower is better), while MSIS evaluates the quality of the 95% confidence band on a scale of 0 to 100 (lower is better).

# Evaluate the model's predictions quantitatively
from scipy.stats import norm
from merlion.evaluate.forecast import ForecastMetric

# Compute the sMAPE of the predictions (0 to 100, smaller is better)
smape = ForecastMetric.sMAPE.value(ground_truth=test_data, predict=test_pred)

# Compute the MSIS of the model's 95% confidence interval (0 to 100, smaller is better)
lb = TimeSeries.from_pd(test_pred.to_pd() + norm.ppf(0.025) * test_err.to_pd().values)
ub = TimeSeries.from_pd(test_pred.to_pd() + norm.ppf(0.975) * test_err.to_pd().values)
msis = ForecastMetric.MSIS.value(ground_truth=test_data, predict=test_pred,
                                 insample=train_data, lb=lb, ub=ub)
print(f"sMAPE: {smape:.4f}, MSIS: {msis:.4f}")
sMAPE: 6.2855, MSIS: 19.1584

Evaluation and Benchmarking

One of Merlion's key features is an evaluation pipeline that simulates the live deployment of a model on historical data. This enables you to compare models on the datasets relevant to them, under the conditions that they may encounter in a production environment. Our evaluation pipeline proceeds as follows:

  1. Train an initial model on recent historical training data (designated as the training split of the time series)
  2. At a regular interval (e.g. once per day), retrain the entire model on the most recent data. This can be either the entire history of the time series, or a more limited window (e.g. 4 weeks).
  3. Obtain the model's predictions (anomaly scores or forecasts) for the time series values that occur between re-trainings. You may customize whether this should be done in batch (predicting all values at once), streaming (updating the model's internal state after each data point without fully re-training it), or some intermediate cadence.
  4. Compare the model's predictions against the ground truth (labeled anomalies for anomaly detection, or the actual time series values for forecasting), and report quantitative evaluation metrics.

We provide scripts that allow you to use this pipeline to evaluate arbitrary models on arbitrary datasets. For example, invoking

python benchmark_anomaly.py --dataset NAB_realAWSCloudwatch --model IsolationForest --retrain_freq 1d

will evaluate the anomaly detection performance of the IsolationForest (retrained once a day) on the "realAWSCloudwatch" subset of the NAB dataset. Similarly, invoking

python benchmark_forecast.py --dataset M4_Hourly --model ETS

will evaluate the batch forecasting performance (i.e. no retraining) of ETS on the "Hourly" subset of the M4 dataset. You can find the results produced by running these scripts in the Experiments section of the technical report.

Technical Report and Citing Merlion

You can find more details in our technical report: https://arxiv.org/abs/2109.09265

If you're using Merlion in your research or applications, please cite using this BibTeX:

@article{bhatnagar2021merlion,
      title={Merlion: A Machine Learning Library for Time Series},
      author={Aadyot Bhatnagar and Paul Kassianik and Chenghao Liu and Tian Lan and Wenzhuo Yang
              and Rowan Cassius and Doyen Sahoo and Devansh Arpit and Sri Subramanian and Gerald Woo
              and Amrita Saha and Arun Kumar Jagota and Gokulakrishnan Gopalakrishnan and Manpreet Singh
              and K C Krithika and Sukumar Maddineni and Daeki Cho and Bo Zong and Yingbo Zhou
              and Caiming Xiong and Silvio Savarese and Steven Hoi and Huan Wang},
      year={2021},
      eprint={2109.09265},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Comments
  • [BUG]. DefaultForecaster returns naive forecast.

    [BUG]. DefaultForecaster returns naive forecast.

    Describe the bug DefaultForecaster returns naive estimate when one might expect otherwise.

    To Reproduce See notebook

    Expected behavior It may be fine. Perhaps it awaits more data or there is a bug in the usage. I'm yet to trace in and the example was created by a fellow contributor.

    Desktop (please complete the following information): Colab.

    opened by microprediction 10
  • Add incremental training option to DAGMM model

    Add incremental training option to DAGMM model

    Currently, the DAGMM model object gets created every time one calls the train method https://github.com/salesforce/Merlion/blob/e21f7be80e8b64ca1075091f11b74b0b9ce1708e/merlion/models/anomaly/dagmm.py#L132

    However, it makes it impossible to:

    • update the model by training it on a new dataset
    • train the model on several timeseries with the same timestamps and column names

    The proposed change adds an option to perform incremental training passing corresponding dictionary to the train method which disables the model recreation. The change does not affect the existing API so the behavior stays the same if no train_config is passed.

    The same change can also be applied to some of the other models.

    @aadyotb would love to hear your opinion.

    cla:signed 
    opened by isenilov 9
  • Request to provide a tutorial of some sort to implement AutoML variants of ETS and prophet for univariate forecasting

    Request to provide a tutorial of some sort to implement AutoML variants of ETS and prophet for univariate forecasting

    Hey, I had been going through the paper "Merlion: A Machine Learning Library for Time Series" and I came across AutoML variants of ETS and prophet models for univariate forecasting. It would be of great help if you could show some tutorial for implementing them, on a simple univariate dataset like the "air-passengers dataset". I have also tried the AutoSarima for the same dataset from the merlion.models.automl module. But it gives very large errors compared to the auto_arima model from the pmdarima library and even basic statsmodel.tsa SARIMAX methods. What could be the reason , given that the air-passengers dataset isn't very complicated to forecast?

    my code for autosarima.

    max_iter = [10,20,50,100,200,400,1000]
    list_autosarima_merlion_models = []  #stores all models with diff parameters
    parameters_autosarima_merlion_models = [] #stores different params used for diff models
    
    for mi in max_iter:
        config1 = AutoSarimaConfig(max_forecast_steps=len(test_df), order=("auto", "auto", "auto"),
                               seasonal_order=("auto", "auto", "auto", 12), approximation=True, maxiter=mi)
        model1  = SeasonalityLayer(model = AutoSarima(model = Sarima(config1)))
        train_pred, train_err = model1.train(train_df_merlion, train_config={"enforce_stationarity": True,"enforce_invertibility": True})
        list_autosarima_merlion_models.append(model1)
        parameters_autosarima_merlion_models.append(f'{mi} maximum iterations')
    

    Link to the paper that I had gone through. https://arxiv.org/abs/2109.09265

    opened by riteshchhetri10 9
  • Merlion dashboard app

    Merlion dashboard app

    This PR implements a web-based visualization dashboard for Merlion. Users can get it set up by installing Merlion with the optional dashboard dependency, i.e. pip install salesforce-merlion[dashboard]. Then, they can start it up with python -m merlion.dashboard, which will start up the dashboard on port 8050. The dashboard has 3 tabs: a file manager where users can upload CSV files & visualize time series; a forecasting tab where users can try different forecasting algorithms on different datasets; and an anomaly detection tab where users can try different anomaly detection algorithms on different datasets. This dashboard thus provides a no-code interface for users to rapidly experiment with different algorithms on their own data, and examine performance both qualitatively (through visualizations) and quantitatively (through evaluation metrics).

    We also provide a Dockerfile which runs the dashboard as a microservice on port 80. The Docker image can be built with docker build . -t merlion-dash -f docker/dashboard/Dockerfile from the Merlion root directory. It can be deployed with docker run -dp 80:80 merlion-dash.

    opened by yangwenzhuo08 5
  • [BUG] AttributeError: 'DefaultForecasterConfig' object has no attribute 'granularity'

    [BUG] AttributeError: 'DefaultForecasterConfig' object has no attribute 'granularity'

    Running 0_ForecastIntro.ipynb

    When training default model throws an Exception

    AttributeError: 'DefaultForecasterConfig' object has no attribute 'granularity'
    

    To Reproduce

    from merlion.models.defaults import DefaultForecasterConfig, DefaultForecaster
    model = DefaultForecaster(DefaultForecasterConfig())
    model.train(train_data=train_data)
    
    • OS: Colab
    • Merlion Version 1.0.2 and 1.1.0
    opened by swarmt 5
  • [BUG] Sarima will not fit to monthly data (infinite loop?)

    [BUG] Sarima will not fit to monthly data (infinite loop?)

    Describe the bug A simple Sarima model won't fit the air passenger data set (seems to loop infinitely in L-BFGS when estimating the parameters). I guess it could be that the internal pre-processing is the culprit, because the same data will be fit when using ARIMA from statsmodels directly.

    To Reproduce

    import pandas as pd
    from merlion.utils import TimeSeries
    from merlion.models.forecast.sarima import Sarima, SarimaConfig
    from statsmodels.tsa.arima.model import ARIMA
    
    data = pd.read_csv('https://raw.githubusercontent.com/facebookresearch/Kats/main/kats/data/air_passengers.csv', names=['time', 'value'], index_col='time', skiprows=1, parse_dates=True)
    data = data.asfreq('MS')
    data = TimeSeries.from_pd(data['value'])
    
    model_sm = ARIMA(data.to_pd(), order=(0, 1, 1), seasonal_order=(2, 1, 0, 12))
    model_sm.fit().summary() # this will work
    
    model_merlion = Sarima(SarimaConfig(order=(0, 1, 1), seasonal_order=(2, 1, 0, 12)))
    model_merlion.train(data) # this won't work
    

    Expected behavior Fitting the Sarima model on monthly data should work.

    Desktop (please complete the following information):

    • OS: Ubuntu 18.04.5 LTS (Bionic Beaver)
    • Merlion: 1.0.0
    • Statsmodels: 0.13.0
    opened by datenzauberai 5
  • Fix Prophet.resample_time_stamps bug

    Fix Prophet.resample_time_stamps bug

    Why?

    When giving time_stamps=1 for method Prophet.resample_time_stamps, it returns an empty np array.

    What?

    This PR fixes a bug on the resample_time_stamps method in the Prophet class when passing an integer to the time_stamps arg.

    How?

    • Fix the value passed to periods argument in pd.date_range function.
    • Add new unit test for Prophet.resample_time_stamps asserting this functionality
    cla:signed 
    opened by rafaelleinio 4
  • [BUG] I can't import any of the merlion subpachages

    [BUG] I can't import any of the merlion subpachages

    Describe the bug I was able to import merlion. but to install and apply any of the following sub packages it was unsucceful and I am getting error that No module is available: from merlion.utils.time_series import TimeSeries from merlion.evaluate.forecast import ForecastMetric from merlion.models.automl.autosarima import AutoSarima, AutoSarimaConfig from merlion.models.automl.seasonality_mixin import SeasonalityLayer from merlion.models.forecast.sarima import Sarima

    To Reproduce Steps to reproduce the behavior

    Expected behavior My expectation is to be able to install these sub packages.

    Screenshots If applicable, add screenshots to help explain your problem.

    Desktop (please complete the following information):

    • OS: [e.g. Ubuntu 16.04 LTS]
    • Merlion Version [e.g. 1.0.0]
    • Spyder=5

    Additional context Add any other context about the problem here.

    opened by Zohoor-NezahdHalafi 4
  • [BUG] `TimeSeries.from_pd()` does not use pandas frequency codes

    [BUG] `TimeSeries.from_pd()` does not use pandas frequency codes

    I'm not sure if I'm missing something but with monthly data, it appears that Merlion does not recognize freq='1MS' as per the pandas offset aliases listed here.

    When I run:

    from merlion.utils import TimeSeries
    ts = TimeSeries.from_pd(df.set_index('cal_month_begin_date'), freq='1MS')
    

    The output is incrementing from 1970-01-01 00:00:00.000 by 1 millisecond, when "MS" is the pandas code for "month start"

    opened by andrewargeros 4
  • Implement AutoETS for searching various error, trend, seasonal components

    Implement AutoETS for searching various error, trend, seasonal components

    1. fix bugs for ets and periodicity detection
    2. implement AutoETS that is able to explore various error, trend, seasonal components. It addresses the failure case in sales forecasting data
    3. improve the way train_pre_process and train_post_process are called by autoML models
    4. don't throw an exception when attempting to obtain IQR from a model which returns a None error; return (None, None) instead.
    opened by chenghaoliu89 3
  • [BUG] AttributeError: 'DefaultForecasterConfig' object has no attribute 'invert_transform'

    [BUG] AttributeError: 'DefaultForecasterConfig' object has no attribute 'invert_transform'

    Describe the bug

    fig, ax = model.plot_forecast(time_series=dataset_test, plot_forecast_uncertainty=True)
    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-49-a413bfb44d12> in <module>
    ----> 1 fig, ax = model.plot_forecast(time_series=dataset_test, plot_forecast_uncertainty=True)
    
    /usr/local/lib/python3.7/site-packages/merlion/models/forecast/base.py in plot_forecast(self, time_series, time_stamps, time_series_prev, plot_forecast_uncertainty, plot_time_series_prev, figsize, ax)
        498             time_series_prev=time_series_prev,
        499             plot_forecast_uncertainty=plot_forecast_uncertainty,
    --> 500             plot_time_series_prev=plot_time_series_prev,
        501         )
        502         title = f"{type(self).__name__}: Forecast of {self.target_name}"
    
    /usr/local/lib/python3.7/site-packages/merlion/models/forecast/base.py in get_figure(self, time_series, time_stamps, time_series_prev, plot_forecast_uncertainty, plot_time_series_prev)
        400         ), "Must provide at least one of time_series or time_stamps"
        401         if time_stamps is None:
    --> 402             if self.invert_transform:
        403                 time_stamps = time_series.time_stamps
        404                 y = time_series.univariates[time_series.names[self.target_seq_index]]
    
    /usr/local/lib/python3.7/site-packages/merlion/models/layers.py in __getattr__(self, item)
        271         if callable(attr):
        272             return attr
    --> 273         return self.__getattribute__(item)
        274 
        275     def train(self, train_data: TimeSeries, *args, **kwargs):
    
    /usr/local/lib/python3.7/site-packages/merlion/models/forecast/base.py in invert_transform(self)
         88         :return: Whether to automatically invert the ``transform`` before returning a forecast.
         89         """
    ---> 90         return self.config.invert_transform
         91 
         92     @property
    
    /usr/local/lib/python3.7/site-packages/merlion/models/layers.py in __getattr__(self, item)
        113         elif base_model is None and item in self.model_kwargs:
        114             return self.model_kwargs.get(item)
    --> 115         return self.__getattribute__(item)
        116 
        117     def __setattr__(self, key, value):
    
    AttributeError: 'DefaultForecasterConfig' object has no attribute 'invert_transform'
    

    To Reproduce Notebook using a different dataset https://github.com/salesforce/Merlion/blob/main/examples/forecast/0_ForecastIntro.ipynb

    Expected behavior Visualization of the actual and forecasted values.

    Desktop (please complete the following information):

    • OS: macOS Big Sur
    • Merlion version: 1.2.0
    opened by bkowshik 3
  • [BUG] Unable to use merlion dashboard - missing release?

    [BUG] Unable to use merlion dashboard - missing release?

    Describe the bug I'm unable to install the Merlion dashboard - it seems that it wasn't released. Last release is 1.3.1 before the dashboard was merged. It might be that you don't intend to release it yet, but in that case the README on the main branch is confusing.

    To Reproduce

    pip install "salesforce-merlion[dashboard]"
    python -m merlion.dashboard            
    > python: No module named merlion.dashboard
    

    Workaround: Install from source

    opened by anton164 1
  • [BUG]  ModuleNotFoundError: No module named 'merlion'

    [BUG] ModuleNotFoundError: No module named 'merlion'

    Describe the bug I am sure I have installed the pacakage "salesforce-merlion". However, when I run "from merlion.utils import UnivariateTimeSeries", I met the problem "ModuleNotFoundError: No module named 'merlion"

    Screenshots image

    Desktop (please complete the following information):

    • OS: [Windows 11]
    • Merlion Version [1.3.1]
    opened by BruceBinBoxing 0
  • Adding Deep Learning Support

    Adding Deep Learning Support

    A major shortcoming of Merlion's forecasting module is a lack of support for deep models that have been popular in the research literature. This PR addresses this gap by adding basic primitives which are shared by most deep learning models, and implementing the recently proposed DeepAR, Informer, Autoformer, and ETSFormer models.

    cla:signed 
    opened by yihaocs 4
  • [FEATURE REQUEST] Unsupervised Evaluation Metrics

    [FEATURE REQUEST] Unsupervised Evaluation Metrics

    Is your feature request related to a problem? Please describe. The current evaluation metrics in evaluate/anomaly.py assume that a ground truth available. However, in many time series anomaly detection problems there is no ground truth.

    It would be great if the Merlion evaluation base classes were more general and supportive of this use-case. As of now we effectively have to implement our own evaluation methods.

    Describe the solution you'd like I think ideally methods/classes such as TSADEvaluator.evaluate, TSADScoreAccumulator and accumulate_tsad_score should not assume that there is a ground truth - other interfaces in the Merlion package typically take test labels as an optional argument. Similarly, the evaluation classes should be able to compute unsupervised descriptive statistics if a ground truth is not passed.

    opened by anton164 4
  • [FEATURE REQUEST] - GridSearch, Time Series Cross Validation, and Back Testing

    [FEATURE REQUEST] - GridSearch, Time Series Cross Validation, and Back Testing

    Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

    Describe the solution you'd like I would like to see gridsearch, ts cv, and backtesting like is found in skforecast / scikit learn. It would improve this packages effectiveness in a production environment

    Describe alternatives you've considered skforecast

    Additional context Add any other context or screenshots about the feature request here.

    opened by drobbster 1
  • [FEATURE REQUEST] somewhat unintuitive default forecasting model

    [FEATURE REQUEST] somewhat unintuitive default forecasting model

    Is your feature request related to a problem? Please describe. It is somewhat unintuitive, that the model returns a standard error in the univariate case and but not in the multivariate case (due to using a tree based forecast in the multivariate case).

    Describe the solution you'd like Ideally the default forecaster should return the same output "format" (forecast and standard error) independent of the input data. This would likely require changing the default forecaster for the multivariate case (A probabilistic tree based method like for example ngboost [https://github.com/stanfordmlgroup/ngboost] might be worth investigating for this.)

    Describe alternatives you've considered Given that the "ideal" solution is quite complex a lower effort alternative would be to make this behavior more clear in the description and documentation of the default forecast model.

    Additional context I think having a default model as an entry point to the forecasting case is a good idea. But the difference in forecast outputs for univariate and multivariate case somewhat defeats this purpose. Intuitively one would probably assume, that the default model has the same features in both cases.

    opened by MBasalla 2
Releases(v1.3.1)
  • v1.3.1(Oct 3, 2022)

    What's Changed

    • Add support for exogenous regressors by @aadyotb in https://github.com/salesforce/Merlion/pull/125
    • Simplify implementations of SARIMA, ETS, VectorAR. by @aadyotb in https://github.com/salesforce/Merlion/pull/122
    • Fix edge case bug in layered models. by @aadyotb in https://github.com/salesforce/Merlion/pull/123
    • Fix incorrect reference to BOCPD model in factory by @jonwiggins in https://github.com/salesforce/Merlion/pull/124

    New Contributors

    • @jonwiggins made their first contribution in https://github.com/salesforce/Merlion/pull/124

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.3.0...v1.3.1

    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Sep 1, 2022)

    What's Changed

    • Improve robustness of Spark API @aadyotb in https://github.com/salesforce/Merlion/pull/118
    • Add easy-to-use data loaders for custom datasets. by @aadyotb in https://github.com/salesforce/Merlion/pull/120
    • More AutoML features included in AutoETS and AutoProphet. More streamlined implementation of AutoML models. by @chenghaoliu89 in https://github.com/salesforce/Merlion/pull/119 and @aadyotb in https://github.com/salesforce/Merlion/pull/121

    Major Changes Since v1.2.0

    • Added a pyspark API for Merlion, to perform forecasting or anomaly detection for many time series in parallel. The API is compatible with the spark-on-k8s-operator.
    • Expanded functionality of AutoETS and AutoProphet models, to automatically select more hyperparameters besides just seasonality.
    • Expanded evaluation framework to accommodate multivariate forecasting. This enables automatic model selection for multivariate time series.
    • Updated Prophet dependency to simplify installation for Python 3.7+.
    • Various bugfixes.

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.2.5...v1.3.0

    Source code(tar.gz)
    Source code(zip)
  • v1.2.5(Jul 15, 2022)

    What's Changed

    • Fix a bug where in layered models which prevents certain sub-model params (e.g. target_seq_index) from being set correctly.
    • Fix an argument parsing bug in the pyspark anomaly detection app.
    • Makes max_forecast_steps an optional parameter for tree models.
    • Makes exceptions from ModelFactory more descriptive.
    • Allow the use of ForecastMetric for evaluating multivariate time series by specifying a target_seq_index.

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.2.4...v1.2.5

    Source code(tar.gz)
    Source code(zip)
  • v1.2.4(Jul 15, 2022)

    What's Changed

    • Enhance spark API & add compatibility with spark-on-k8s-operator. by @aadyotb in https://github.com/salesforce/Merlion/pull/114
    • Enhance tree forecasters to work with max_forecast_steps=None and return_prev=True. by @aadyotb in https://github.com/salesforce/Merlion/pull/114
    • Fix Prophet.resample_time_stamps bug by @rafaelleinio in https://github.com/salesforce/Merlion/pull/112
    • Add policy to replace missing time series values with 0. by @aadyotb in https://github.com/salesforce/Merlion/pull/113

    New Contributors

    • @rafaelleinio made their first contribution in https://github.com/salesforce/Merlion/pull/112

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.2.3...v1.2.4

    Source code(tar.gz)
    Source code(zip)
  • v1.2.3(Jun 28, 2022)

    What's Changed

    • Update prophet required version to 1.1 by @aadyotb in https://github.com/salesforce/Merlion/pull/110
    • Don't resample timestamps if not needed in forecast(). by @aadyotb in https://github.com/salesforce/Merlion/pull/109

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.2.2...v1.2.3

    Source code(tar.gz)
    Source code(zip)
  • v1.2.2(Jun 21, 2022)

    What's Changed

    • Fix AutoSARIMA bugs. by @aadyotb in https://github.com/salesforce/Merlion/pull/106
    • Add beta pyspark API. by @aadyotb in https://github.com/salesforce/Merlion/pull/107
    • Make test coverage reports more accurate. by @aadyotb in https://github.com/salesforce/Merlion/pull/108

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.2.1...v1.2.2

    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Jun 9, 2022)

    What's Changed

    • Fix issues with LayeredModelConfig accessing underlying model attributes. This fixes various bugs with both AutoML models and default models. by @aadyotb in https://github.com/salesforce/Merlion/pull/99 and https://github.com/salesforce/Merlion/pull/104

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.2.0...v1.2.1

    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Jun 1, 2022)

    What's Changed

    • Add intermediate API's to all models. by @aadyotb in https://github.com/salesforce/Merlion/pull/90
      • Internal changes; should not cause any breaking changes for end users.
      • Implements train(), forecast(), and get_anomaly_score() at the level of the base class for all models. Each of these methods respectively calls an implementation-specific _train(), _forecast(), or _get_anomaly_score().
      • The base classes now include much of what was previously boilerplate code that had to be duplicated for each model (applying pre-processing transforms, converting standard errors to inter-quartile ranges, training post-rules, etc.).
    • Fix forecasting bugs when return_prev=True. by @aadyotb in https://github.com/salesforce/Merlion/pull/97

    Major Changes Since v1.1.0

    • Added intermediate API's to all models.
    • Made installation more lightweight.
    • Changed Python-Java bridge from jpype to py4j, for improved robustness in multiprocessing settings.
    • Various bugfixes.

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.1.3...v1.2.0

    Source code(tar.gz)
    Source code(zip)
  • v1.1.3(Apr 19, 2022)

    What's Changed

    • Fix ETS refit params (Issue #78). by @aadyotb in https://github.com/salesforce/Merlion/pull/79
    • Allow TimeSeries to include NaN values. by @aadyotb in https://github.com/salesforce/Merlion/pull/85
    • Fix n_retrain option in benchmark_forecast.py (Issue #82). by @aadyotb in https://github.com/salesforce/Merlion/pull/86

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.1.2...v1.1.3

    Source code(tar.gz)
    Source code(zip)
  • v1.1.2(Mar 3, 2022)

    What's Changed

    • Replace absolute path with a relative one in the model saving/loading procedure by @isenilov in https://github.com/salesforce/Merlion/pull/63
    • Combine boosting/bagging trees into one file. by @aadyotb in https://github.com/salesforce/Merlion/pull/68
    • Add incremental training option to DAGMM model by @isenilov in https://github.com/salesforce/Merlion/pull/65
    • Change default min_likelihood to 1e-16 for BOCPD by @cnll0075 in https://github.com/salesforce/Merlion/pull/71
    • Let eval metrics work w/ UnivariateTimeSeries. by @aadyotb in https://github.com/salesforce/Merlion/pull/74
    • Implement reconciliation for hierarchical time series. by @aadyotb in https://github.com/salesforce/Merlion/pull/72

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.1.1...v1.1.2

    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Jan 18, 2022)

    What's Changed

    • Explicitly specify int64 for timestamps (not int) to fix a Windows bug (#58)
    • Fix bugs with MoE_ForecasterEnsemble (#51), default models (#57), and Prophet holidays (#59)
    • Make base installation more lightweight (#61)
    • Use py4j instead of jpype for Python-Java bridge (#62)

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.1.0...v1.1.1

    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Dec 17, 2021)

    What's Changed

    • Re-architecture of layered models & ensembles, making AutoML easier to use (#47)
    • Added AutoML variants of ETS and Prophet (AutoETS and AutoProphet, similar to the existing AutoSarima)
    • Bug fixes related to resampling (#45)
    • Improved quality of API docs for model configs

    Major Changes Since v1.0.0

    • Added change point detection module.
    • Re-architecture of layered models & ensembles.
    • Expanded AutoML module, with improved ease of use.
    • Various bugfixes.

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.0.2...v1.1.0

    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Nov 9, 2021)

    What's Changed

    • Add change point detection module (#41)
    • Add more config options to Prophet (#43)

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.0.1...v1.0.2

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Oct 18, 2021)

    What's Changed

    • Update prophet version (#18)
    • Allow sampling granularities that aren't a constant number of seconds, e.g. monthly (#30)
    • Fix bugs with AutoSARIMA implementation (#32)
    • Make data loading code more robust (#35)
    • Fix minor bug in benchmark_anomaly.py (#38)

    Full Changelog: https://github.com/salesforce/Merlion/compare/v1.0.0...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Sep 23, 2021)

Owner
Salesforce
A variety of vendor agnostic projects which power Salesforce
Salesforce
About Solve CTF offline disconnection problem - based on python3's small crawler

About Solve CTF offline disconnection problem - based on python3's small crawler, support keyword search and local map bed establishment, currently support Jianshu, xianzhi,anquanke,freebuf,seebug

天河 32 Oct 25, 2022
ML Optimizers from scratch using JAX

Toy implementations of some popular ML optimizers using Python/JAX

Shreyansh Singh 38 Jul 29, 2022
Bayesian Modeling and Computation in Python

Bayesian Modeling and Computation in Python Open access and Code This repository contains the open access version of the text and the code examples in

Bayesian Modeling and Computation in Python 339 Jan 02, 2023
Nevergrad - A gradient-free optimization platform

Nevergrad - A gradient-free optimization platform nevergrad is a Python 3.6+ library. It can be installed with: pip install nevergrad More installati

Meta Research 3.4k Jan 08, 2023
neurodsp is a collection of approaches for applying digital signal processing to neural time series

neurodsp is a collection of approaches for applying digital signal processing to neural time series, including algorithms that have been proposed for the analysis of neural time series. It also inclu

NeuroDSP 224 Dec 02, 2022
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

Daniel Han-Chen 1.4k Jan 01, 2023
A benchmark of data-centric tasks from across the machine learning lifecycle.

A benchmark of data-centric tasks from across the machine learning lifecycle.

61 Dec 28, 2022
Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application (with docker-compose).

Philip May 2 Dec 03, 2021
NumPy-based implementation of a multilayer perceptron (MLP)

My own NumPy-based implementation of a multilayer perceptron (MLP). Several of its components can be tuned and played with, such as layer depth and size, hidden and output layer activation functions,

1 Feb 10, 2022
pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM

pywFM pywFM is a Python wrapper for Steffen Rendle's libFM. libFM is a Factorization Machine library: Factorization machines (FM) are a generic approa

João Ferreira Loff 251 Sep 23, 2022
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Distributed Evolutionary Algorithms in Python 4.9k Jan 05, 2023
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
This is a Machine Learning model which predicts the presence of Diabetes in Patients

Diabetes Disease Prediction This is a machine Learning mode which tries to determine if a person has a diabetes or not. Data The dataset is in comma s

Edem Gold 4 Mar 16, 2022
🤖 ⚡ scikit-learn tips

🤖 ⚡ scikit-learn tips New tips are posted on LinkedIn, Twitter, and Facebook. 👉 Sign up to receive 2 video tips by email every week! 👈 List of all

Kevin Markham 1.6k Jan 03, 2023
An AutoML survey focusing on practical systems.

This project is a community effort in constructing and maintaining an up-to-date beginner-friendly introduction to AutoML, focusing on practical systems. AutoML is a big field, and continues to grow

AutoGOAL 16 Aug 14, 2022
AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Data Science on AWS - O'Reilly Book Get the book on Amazon.com Book Outline Quick Start Workshop (4-hours) In this quick start hands-on workshop, you

Data Science on AWS 2.8k Jan 03, 2023
Bodywork deploys machine learning projects developed in Python, to Kubernetes.

Bodywork deploys machine learning projects developed in Python, to Kubernetes. It helps you to: serve models as microservices execute batch jobs run r

Bodywork Machine Learning 409 Jan 01, 2023
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Axel 1.4k Jan 06, 2023
MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data

MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data. We demonstrate its use

Pachter Lab 26 Nov 29, 2022
Distributed deep learning on Hadoop and Spark clusters.

Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version

Yahoo 1.3k Dec 28, 2022