当前位置:网站首页>Machine learning notes - seasonality of time series
Machine learning notes - seasonality of time series
2022-06-26 13:04:00 【Sit and watch the clouds rise】
One 、 Seasonality
As long as the average of the sequence is regular 、 Periodic changes , Time series will show seasonality . Seasonal changes usually follow the clock and calendar —— One day 、 It's common to repeat a week or a year . Seasonality is usually driven by the natural world's cycles over days and years or by social behavioural conventions around dates and times .

Two simulated seasonal features . The first one is , indicators , It is best for observing very short periods , E.g. weekly observation . The second kind , Fourier characteristic , The best season to observe long-term changes , For example, the daily observation season every year .
Two 、 Seasonal charts and seasonal indicators
Just as we use moving averages to find trends in a series , We can use seasonal maps to find seasonal patterns .
Seasonal charts show time series segments plotted for a common period , This period is what you want to observe “ season ”. This figure shows the seasonal graph of daily views of Wikipedia articles on trigonometric functions : Daily views of articles are plotted over a common weekly period .

1、 Seasonal indicators
The seasonal index is a binary characteristic that represents the seasonal difference at the time series level . If you consider the seasonal period as a classification feature and apply a single heat code , Then we can get the seasonal index .
By coding every day of the week , We get weekly seasonal indicators . Creating weekly metrics for the trigonometric series will provide us with six new “ fictitious ” function . ( If you give up one of the indicators , Linear regression is the best ; We have selected Monday in the figure below .)
| Date | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
|---|---|---|---|---|---|---|
| 2016-01-04 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2016-01-05 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2016-01-06 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2016-01-07 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
| 2016-01-08 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
| 2016-01-09 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 2016-01-10 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 2016-01-11 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... |
Adding seasonal indicators to the training data helps the model distinguish the average value in the seasonal cycle :

2、 Fourier characteristics and periodogram
The feature we are discussing now is more suitable for long seasons , Not many observations with unrealistic indicators . Fourier features do not create a feature for each date , Instead, try to use several features to capture the overall shape of the seasonal curve .
Let's take a look at the graph of annual seasons in the trigonometric function . Pay attention to the repetition of various frequencies : Every year, 3 Second long up and down movement , Every year, 52 Times of short cycle exercise , Maybe there's something else .

We try to capture these frequencies in a season with Fourier characteristics . The idea is to include in our training data periodic curves with the same frequency as the seasons we are trying to model . The curves we use are the sine and cosine curves of trigonometric functions .
Fourier characteristics are sine and cosine curve pairs , From the longest season , Each potential frequency corresponds to a pair of . Fourier pairs that simulate annual seasonality will have frequencies : Once a year 、 Twice a year 、 Three times a year , wait .

If we take a set of these sinusoids / Cosine curves are added to our training data , The linear regression algorithm will calculate the weight suitable for the seasonal component in the target sequence . This figure illustrates how linear regression uses four Fourier pairs to simulate Wiki Trigonometry Annual seasonality in the series .

We only need eight characteristics ( Four sinusoids / Cosine pair ) The annual seasonality can be well estimated . Combine this with the need for hundreds of features ( One for every day of the year ) The seasonal index methods are compared . By using only Fourier features for seasonal “ The main effect ” Modeling , You usually need to add fewer features to your training data , This means that the calculation time is reduced and the risk of over fitting is reduced .
3、 Use periodogram to select Fourier features
How many Fourier pairs should we actually include in the feature set ? We can answer this question with a periodic graph . The periodogram tells you the intensity of the frequency in the time series . say concretely , Chart y The value on the axis is
, among a and b Is the sine and cosine coefficients at this frequency ( As shown in the Fourier component diagram above )).

From left to right , The periodic graph is in Quarterly Then it goes down , Four times a year . This is why we chose four Fourier pairs to simulate the annual seasons . We ignore the weekly frequency , Because it better models with metrics .
Calculate Fourier characteristics
Understanding how Fourier features are calculated is not essential to using them , But if you see the details, you can clarify things , The cell hidden cells below illustrate how to derive a set of Fourier features from the index of the time series .( however , We will use in our application the code from statsmodels Library function .)
import numpy as np
def fourier_features(index, freq, order):
time = np.arange(len(index), dtype=np.float32)
k = 2 * np.pi * (1 / freq) * time
features = {}
for i in range(1, order + 1):
features.update({
f"sin_{freq}_{i}": np.sin(i * k),
f"cos_{freq}_{i}": np.cos(i * k),
})
return pd.DataFrame(features, index=index)
# Compute Fourier features to the 4th order (8 new features) for a
# series y with daily observations and annual seasonality:
#
# fourier_features(y, freq=365.25, order=4)3、 ... and 、 Example - Tunnel flow
We will continue to use the tunnel traffic data set again . This hidden cell loads data and defines two functions :seasonal_plot and plot_periodogram.
from pathlib import Path
from warnings import simplefilter
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess
simplefilter("ignore")
# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
"axes",
labelweight="bold",
labelsize="large",
titleweight="bold",
titlesize=16,
titlepad=10,
)
plot_params = dict(
color="0.75",
style=".-",
markeredgecolor="0.25",
markerfacecolor="0.25",
legend=False,
)
%config InlineBackend.figure_format = 'retina'
# annotations: https://stackoverflow.com/a/49238256/5769929
def seasonal_plot(X, y, period, freq, ax=None):
if ax is None:
_, ax = plt.subplots()
palette = sns.color_palette("husl", n_colors=X[period].nunique(),)
ax = sns.lineplot(
x=freq,
y=y,
hue=period,
data=X,
ci=False,
ax=ax,
palette=palette,
legend=False,
)
ax.set_title(f"Seasonal Plot ({period}/{freq})")
for line, name in zip(ax.lines, X[period].unique()):
y_ = line.get_ydata()[-1]
ax.annotate(
name,
xy=(1, y_),
xytext=(6, 0),
color=line.get_color(),
xycoords=ax.get_yaxis_transform(),
textcoords="offset points",
size=14,
va="center",
)
return ax
def plot_periodogram(ts, detrend='linear', ax=None):
from scipy.signal import periodogram
fs = pd.Timedelta("1Y") / pd.Timedelta("1D")
freqencies, spectrum = periodogram(
ts,
fs=fs,
detrend=detrend,
window="boxcar",
scaling='spectrum',
)
if ax is None:
_, ax = plt.subplots()
ax.step(freqencies, spectrum, color="purple")
ax.set_xscale("log")
ax.set_xticks([1, 2, 4, 6, 12, 26, 52, 104])
ax.set_xticklabels(
[
"Annual (1)",
"Semiannual (2)",
"Quarterly (4)",
"Bimonthly (6)",
"Monthly (12)",
"Biweekly (26)",
"Weekly (52)",
"Semiweekly (104)",
],
rotation=30,
)
ax.ticklabel_format(axis="y", style="sci", scilimits=(0, 0))
ax.set_ylabel("Variance")
ax.set_title("Periodogram")
return ax
data_dir = Path("../input/ts-course-data")
tunnel = pd.read_csv(data_dir / "tunnel.csv", parse_dates=["Day"])
tunnel = tunnel.set_index("Day").to_period("D")Let's take a look at the curve of one week and more than one year .
X = tunnel.copy()
# days within a week
X["day"] = X.index.dayofweek # the x-axis (freq)
X["week"] = X.index.week # the seasonal period (period)
# days within a year
X["dayofyear"] = X.index.dayofyear
X["year"] = X.index.year
fig, (ax0, ax1) = plt.subplots(2, 1, figsize=(11, 6))
seasonal_plot(X, y="NumVehicles", period="week", freq="day", ax=ax0)
seasonal_plot(X, y="NumVehicles", period="year", freq="dayofyear", ax=ax1);
Now let's look at the periodic graph :
plot_periodogram(tunnel.NumVehicles);
The periodic chart is consistent with the seasonal chart above : Weekly peak season and annual season are weak . We will model the weekly seasons with metrics , Use Fourier features to model the seasons of each year . From right to left , The periodic chart is in bimonthly (6) And every month (12) Decreasing between , So let's use 10 Fourier pairs .
We will use DeterministicProcess Create our seasonal features , This is our first 2 The same utilities used in this lesson to create trend characteristics . Use two seasonal periods ( Weekly and yearly ), We need to instantiate one of them as “ Add on ”:
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess
fourier = CalendarFourier(freq="A", order=10) # 10 sin/cos pairs for "A"nnual seasonality
dp = DeterministicProcess(
index=tunnel.index,
constant=True, # dummy feature for bias (y-intercept)
order=1, # trend (order 1 means linear)
seasonal=True, # weekly seasonality (indicators)
additional_terms=[fourier], # annual seasonality (fourier)
drop=True, # drop terms to avoid collinearity
)
X = dp.in_sample() # create features for dates in tunnel.indexAfter creating the feature set , We can fit the model and make predictions . We're going to add a 90 Day forecast , To understand how our model infers beyond the training data .
y = tunnel["NumVehicles"]
model = LinearRegression(fit_intercept=False)
_ = model.fit(X, y)
y_pred = pd.Series(model.predict(X), index=y.index)
X_fore = dp.out_of_sample(steps=90)
y_fore = pd.Series(model.predict(X_fore), index=X_fore.index)
ax = y.plot(color='0.25', style='.', title="Tunnel Traffic - Seasonal Forecast")
ax = y_pred.plot(ax=ax, label="Seasonal")
ax = y_fore.plot(ax=ax, label="Seasonal Forecast", color='C3')
_ = ax.legend()
We can also use time series to do more to improve our predictions . The next step is to use the time series itself as a feature . Using time series as input to predictions allows us to model another component that often appears in the series : cycle .
边栏推荐
猜你喜欢

【网络是怎么连接的】第二章(上): 建立连接,传输数据,断开连接

This function has none of deterministic, no SQL solution

倍福CX5130换卡对已有的授权文件转移操作

Redis learning - 05 node JS client operation redis and pipeline pipeline

processing 随机生成线动画

倍福PLC基于NT_Shutdown实现控制器自动关机重启

Record a phpcms9.6.3 vulnerability to use the getshell to the intranet domain control
![Vivado 错误代码 [DRC PDCN-2721] 解决](/img/de/ce1a72f072254ae227fdcb307641a2.png)
Vivado 错误代码 [DRC PDCN-2721] 解决

【网络是怎么连接的】第二章(中):一个网络包的发出

机器学习笔记 - 时间序列的季节性
随机推荐
. Net Maui performance improvement
Accumulation of interview questions
tauri vs electron
记一次phpcms9.6.3漏洞利用getshell到内网域控
A must for programmers, an artifact utools that can improve your work efficiency n times
Tiger DAO VC产品正式上线,Seektiger生态的有力补充
[geek challenge 2019] rce me 1
倍福PLC通过程序获取系统时间、本地时间、当前时区以及系统时间时区转换
find及du -sh显示权限不够的解决方法
.NET MAUI 性能提升
code force Party Lemonade
【网络是怎么连接的】第二章(下):一个网络包的接收
zoopeeper设置acl权限控制(只允许特定ip访问,加强安全)
不到40行代码手撸一个BlocProvider
Go 结构体方法
复制多个excel然后命名不同的名字
倍福CX5130换卡对已有的授权文件转移操作
黑马笔记---常用API
[esp32-C3][RT-THREAD] 基于ESP32C3运行RT-THREAD bsp最小系统
solo 博客系统的 rss 渲染失败