当前位置:网站首页>Machine learning notes - seasonality of time series

Machine learning notes - seasonality of time series

2022-06-26 13:04:00 Sit and watch the clouds rise

One 、 Seasonality

         As long as the average of the sequence is regular 、 Periodic changes , Time series will show seasonality . Seasonal changes usually follow the clock and calendar —— One day 、 It's common to repeat a week or a year . Seasonality is usually driven by the natural world's cycles over days and years or by social behavioural conventions around dates and times .

Seasonal patterns in four time series .

          Two simulated seasonal features . The first one is , indicators , It is best for observing very short periods , E.g. weekly observation . The second kind , Fourier characteristic , The best season to observe long-term changes , For example, the daily observation season every year .

Two 、 Seasonal charts and seasonal indicators

         Just as we use moving averages to find trends in a series , We can use seasonal maps to find seasonal patterns .

         Seasonal charts show time series segments plotted for a common period , This period is what you want to observe “ season ”. This figure shows the seasonal graph of daily views of Wikipedia articles on trigonometric functions : Daily views of articles are plotted over a common weekly period .

1、 Seasonal indicators

         The seasonal index is a binary characteristic that represents the seasonal difference at the time series level . If you consider the seasonal period as a classification feature and apply a single heat code , Then we can get the seasonal index .

         By coding every day of the week , We get weekly seasonal indicators . Creating weekly metrics for the trigonometric series will provide us with six new “ fictitious ” function . ( If you give up one of the indicators , Linear regression is the best ; We have selected Monday in the figure below .)

DateTuesdayWednesdayThursdayFridaySaturdaySunday
2016-01-040.00.00.00.00.00.0
2016-01-051.00.00.00.00.00.0
2016-01-060.01.00.00.00.00.0
2016-01-070.00.01.00.00.00.0
2016-01-080.00.00.01.00.00.0
2016-01-090.00.00.00.01.00.0
2016-01-100.00.00.00.00.01.0
2016-01-110.00.00.00.00.00.0
.....................

         Adding seasonal indicators to the training data helps the model distinguish the average value in the seasonal cycle :

The average value of each time in the learning season of ordinary linear regression .

 2、 Fourier characteristics and periodogram

         The feature we are discussing now is more suitable for long seasons , Not many observations with unrealistic indicators . Fourier features do not create a feature for each date , Instead, try to use several features to capture the overall shape of the seasonal curve .

         Let's take a look at the graph of annual seasons in the trigonometric function . Pay attention to the repetition of various frequencies : Every year, 3 Second long up and down movement , Every year, 52 Times of short cycle exercise , Maybe there's something else .

Wiki Trigonometry Annual seasonality in the series .

          We try to capture these frequencies in a season with Fourier characteristics . The idea is to include in our training data periodic curves with the same frequency as the seasons we are trying to model . The curves we use are the sine and cosine curves of trigonometric functions .

         Fourier characteristics are sine and cosine curve pairs , From the longest season , Each potential frequency corresponds to a pair of . Fourier pairs that simulate annual seasonality will have frequencies : Once a year 、 Twice a year 、 Three times a year , wait .

The first two Fourier pairs of annual seasonality . Top : Once a year . Bottom : Twice a year .

          If we take a set of these sinusoids / Cosine curves are added to our training data , The linear regression algorithm will calculate the weight suitable for the seasonal component in the target sequence . This figure illustrates how linear regression uses four Fourier pairs to simulate Wiki Trigonometry Annual seasonality in the series .

Top : The curve of four Fourier pairs , Sum of sine and cosine and regression coefficient . Each curve simulates a different frequency . Bottom : The sum of these curves approximates the seasonal pattern .

          We only need eight characteristics ( Four sinusoids / Cosine pair ) The annual seasonality can be well estimated . Combine this with the need for hundreds of features ( One for every day of the year ) The seasonal index methods are compared . By using only Fourier features for seasonal “ The main effect ” Modeling , You usually need to add fewer features to your training data , This means that the calculation time is reduced and the risk of over fitting is reduced .

3、 Use periodogram to select Fourier features

         How many Fourier pairs should we actually include in the feature set ? We can answer this question with a periodic graph . The periodogram tells you the intensity of the frequency in the time series . say concretely , Chart y The value on the axis is (a ^ 2 + b ^ 2) / 2, among a and b Is the sine and cosine coefficients at this frequency ( As shown in the Fourier component diagram above )).

Wiki Trigonometry Periodic diagram of the series .

          From left to right , The periodic graph is in Quarterly Then it goes down , Four times a year . This is why we chose four Fourier pairs to simulate the annual seasons . We ignore the weekly frequency , Because it better models with metrics .

          Calculate Fourier characteristics

         Understanding how Fourier features are calculated is not essential to using them , But if you see the details, you can clarify things , The cell hidden cells below illustrate how to derive a set of Fourier features from the index of the time series .( however , We will use in our application the code from statsmodels Library function .)

import numpy as np


def fourier_features(index, freq, order):
    time = np.arange(len(index), dtype=np.float32)
    k = 2 * np.pi * (1 / freq) * time
    features = {}
    for i in range(1, order + 1):
        features.update({
            f"sin_{freq}_{i}": np.sin(i * k),
            f"cos_{freq}_{i}": np.cos(i * k),
        })
    return pd.DataFrame(features, index=index)


# Compute Fourier features to the 4th order (8 new features) for a
# series y with daily observations and annual seasonality:
#
# fourier_features(y, freq=365.25, order=4)

3、 ... and 、 Example - Tunnel flow

         We will continue to use the tunnel traffic data set again . This hidden cell loads data and defines two functions :seasonal_plot and plot_periodogram.

from pathlib import Path
from warnings import simplefilter

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess

simplefilter("ignore")

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=16,
    titlepad=10,
)
plot_params = dict(
    color="0.75",
    style=".-",
    markeredgecolor="0.25",
    markerfacecolor="0.25",
    legend=False,
)
%config InlineBackend.figure_format = 'retina'


# annotations: https://stackoverflow.com/a/49238256/5769929
def seasonal_plot(X, y, period, freq, ax=None):
    if ax is None:
        _, ax = plt.subplots()
    palette = sns.color_palette("husl", n_colors=X[period].nunique(),)
    ax = sns.lineplot(
        x=freq,
        y=y,
        hue=period,
        data=X,
        ci=False,
        ax=ax,
        palette=palette,
        legend=False,
    )
    ax.set_title(f"Seasonal Plot ({period}/{freq})")
    for line, name in zip(ax.lines, X[period].unique()):
        y_ = line.get_ydata()[-1]
        ax.annotate(
            name,
            xy=(1, y_),
            xytext=(6, 0),
            color=line.get_color(),
            xycoords=ax.get_yaxis_transform(),
            textcoords="offset points",
            size=14,
            va="center",
        )
    return ax


def plot_periodogram(ts, detrend='linear', ax=None):
    from scipy.signal import periodogram
    fs = pd.Timedelta("1Y") / pd.Timedelta("1D")
    freqencies, spectrum = periodogram(
        ts,
        fs=fs,
        detrend=detrend,
        window="boxcar",
        scaling='spectrum',
    )
    if ax is None:
        _, ax = plt.subplots()
    ax.step(freqencies, spectrum, color="purple")
    ax.set_xscale("log")
    ax.set_xticks([1, 2, 4, 6, 12, 26, 52, 104])
    ax.set_xticklabels(
        [
            "Annual (1)",
            "Semiannual (2)",
            "Quarterly (4)",
            "Bimonthly (6)",
            "Monthly (12)",
            "Biweekly (26)",
            "Weekly (52)",
            "Semiweekly (104)",
        ],
        rotation=30,
    )
    ax.ticklabel_format(axis="y", style="sci", scilimits=(0, 0))
    ax.set_ylabel("Variance")
    ax.set_title("Periodogram")
    return ax


data_dir = Path("../input/ts-course-data")
tunnel = pd.read_csv(data_dir / "tunnel.csv", parse_dates=["Day"])
tunnel = tunnel.set_index("Day").to_period("D")

        Let's take a look at the curve of one week and more than one year .

X = tunnel.copy()

# days within a week
X["day"] = X.index.dayofweek  # the x-axis (freq)
X["week"] = X.index.week  # the seasonal period (period)

# days within a year
X["dayofyear"] = X.index.dayofyear
X["year"] = X.index.year
fig, (ax0, ax1) = plt.subplots(2, 1, figsize=(11, 6))
seasonal_plot(X, y="NumVehicles", period="week", freq="day", ax=ax0)
seasonal_plot(X, y="NumVehicles", period="year", freq="dayofyear", ax=ax1);

          Now let's look at the periodic graph :

plot_periodogram(tunnel.NumVehicles);

         The periodic chart is consistent with the seasonal chart above : Weekly peak season and annual season are weak . We will model the weekly seasons with metrics , Use Fourier features to model the seasons of each year . From right to left , The periodic chart is in bimonthly (6) And every month (12) Decreasing between , So let's use 10 Fourier pairs .

         We will use DeterministicProcess Create our seasonal features , This is our first 2 The same utilities used in this lesson to create trend characteristics . Use two seasonal periods ( Weekly and yearly ), We need to instantiate one of them as “ Add on ”:

from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess

fourier = CalendarFourier(freq="A", order=10)  # 10 sin/cos pairs for "A"nnual seasonality

dp = DeterministicProcess(
    index=tunnel.index,
    constant=True,               # dummy feature for bias (y-intercept)
    order=1,                     # trend (order 1 means linear)
    seasonal=True,               # weekly seasonality (indicators)
    additional_terms=[fourier],  # annual seasonality (fourier)
    drop=True,                   # drop terms to avoid collinearity
)

X = dp.in_sample()  # create features for dates in tunnel.index

         After creating the feature set , We can fit the model and make predictions . We're going to add a 90 Day forecast , To understand how our model infers beyond the training data .

y = tunnel["NumVehicles"]

model = LinearRegression(fit_intercept=False)
_ = model.fit(X, y)

y_pred = pd.Series(model.predict(X), index=y.index)
X_fore = dp.out_of_sample(steps=90)
y_fore = pd.Series(model.predict(X_fore), index=X_fore.index)

ax = y.plot(color='0.25', style='.', title="Tunnel Traffic - Seasonal Forecast")
ax = y_pred.plot(ax=ax, label="Seasonal")
ax = y_fore.plot(ax=ax, label="Seasonal Forecast", color='C3')
_ = ax.legend()

          We can also use time series to do more to improve our predictions . The next step is to use the time series itself as a feature . Using time series as input to predictions allows us to model another component that often appears in the series : cycle .

原网站

版权声明
本文为[Sit and watch the clouds rise]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206261209598659.html