当前位置:网站首页>Must see, time series analysis
Must see, time series analysis
2022-07-01 18:37:00 【spssau】
In this article, we will mainly introduce ARIMA Model , This is the most commonly used time series model in practical cases .
01 What is the time series ?
Time series data is arranged in chronological order 、 A time-varying and interrelated sequence of data , By studying the changing trend of historical data , To evaluate and predict future data . Time series data often appear in the economy 、 Finance 、 In the field of business data analysis .
02 Inspection process
First of all , First, the preprocessing of time series includes two aspects of testing , Stationarity test and white noise test . The stationarity of a sequence , Generally, it is judged by sequence diagram and correlation diagram . If the sequence is nonstationary , Through differential processing of data , Then perform the stability test , Judge whether the sequence is stable . Generally in application , The order of the difference (d) No more than 2.
second , ARIMA Model (p,d,q) It is called differential autoregressive moving average model , According to whether the original sequence is stable and the parts contained in the regression ,ARIMA The model can be split into 3 term , Namely AR(p) Model 、I(d) Difference 、 and MA(q) Model , Therefore, it is necessary to determine the order of these three parameters respectively . Generally, partial ( since ) The correlation diagram is appropriate p、q Order , And the use of ADF The appropriate order of difference is obtained by testing d.
Third , In determining the order of autoregression p, The order of difference d Value and moving average order q this 3 The model can be built after three parameters .SPSSAU Of 【ARIMA prediction model 】 Can intelligently find the best model , Provide the best q、d、p Value recommendation .( The principle is to use AIC The rule of minimum value , Traverse various possible model combinations for model construction , And combine it with AIC The minimum rule , Finally get the best model , The process of model optimization is omitted .)
Fourth , ARIMA The model results are output 3 A form , The first 1 The first table is the table of fitting model parameters , Show the results of model construction, including the value of regression coefficient ,p It's worth waiting for ; The first 2 A table is a model Q Statistics table , Used to check whether the residual is white noise ; The first 3 The first table is the predicted value of the model , Provide future 12 The model prediction value of the period .
03 case analysis
(1) background
The current is 1978~2006 Annual total 29 Data on per capita health costs in , Want to use SPSSAU The data analysis platform finds the right ARIMA The model predicts the per capita health expenditure in China .
(2) data format
The format of time series includes Time and actual analysis items There are two columns .
For example, the year in the figure below is the time item ,“ Per capita health costs ” Is the actual analysis item . There is no need to set a time item during analysis , But the data compiled by researchers must be in this format , The date increases from top to bottom . Then upload the sorted data to SPSSAU Analysis platform .
The unit of time series is generally year , such as “ Over the years in China GDP data ”、“ China's population data over the years ” etc. . Of course, if the unit is month or quarter 、 Or wait a week , It can reflect the change law of data , It can also be used as time series data .
(3) Stability test
The first problem encountered in time series analysis is the stability of data . Data stationarity can be measured by sequence diagram , Visually observe the data characteristics to judge whether it is stable . however , The graph test method is highly subjective , So I also use ADF The test is the unit root test to get a more accurate judgment .
- Scatter plot ( Sequence diagram )
Operation steps :SPSSAU【 visualization 】-【 Scatter plot 】.
A stable time series is often shown graphically as a continuous fluctuation around the mean , If it is a non-stationary series, it often shows that it has different mean values in different time periods . Such as continuous rise or decline .
As can be seen from the above figure , The scatter points in the figure have an obvious upward trend , Does not meet the requirements of time series . So take the logarithm of the data first , Then carry out differential processing and then carry out inspection .
Generating variables - Logarithmic processing
- ADF test
The sequence diagram tests the stability of the sequence with strong subjectivity , So I also use ADF The test is the unit root test to get a more accurate judgment .
The unit root test can be used to check whether there is a unit root in the time series , If there is a unit root, it means that it is a nonstationary sequence . If there is a unit root, that is, the time series data is unstable , Subsequent analysis is usually not possible, such as ARIMA Model .
Operation steps :
① choice SPSSAU【 Econometric research 】-【ADF test 】.
② In the analysis box , Put in “Ln_ Per capita health costs ”. Differential order selection “ Automatically ”, Type default , Click start analysis .
ADF test
Result analysis :
As can be seen from the above table , For per capita health costs , The time series data ADF Tested t The statistic is -0.778,p The value is 0.825,1%、5%、10% The critical values are -3.700、-2.976、-2.628.
p=0.825>0.1, The original hypothesis cannot be rejected , The sequence is unstable . Make a first-order difference to the sequence and then ADF test .
Data after first-order difference ADF The test results show that p=0.287>0.1, The original hypothesis cannot be rejected , The sequence is unstable , Perform second-order difference on the sequence and then ADF test .
Data after second-order difference ADF The test results show that p=0.000<0.01, There are more than 99% Grasp and reject the original hypothesis , At this time, the sequence is stable . Therefore, we can consider the number of differences d As the 2.
(4) partial ( since ) Correlation chart
According to the stability test above , The sequence after the second-order difference is a stationary sequence , Next you can draw 【 partial ( since ) Correlation chart 】 To judge p、q The order of .
Operation steps :
① choice SPSSAU【 Econometric research 】-【 partial ( since ) Correlation chart 】
② In the analysis box , Put in “Ln_ Per capita health costs ”. Differential order selection “2 rank ”, Click start analysis .
About ARMA The general judgment criteria are described in the following table :
- trailing : There are always nonzero values , It will not quickly approach after being greater than a certain order 0( But in 0 Nearby fluctuations ), It can be simply understood as not for 0, But after a certain stage, in 0 Random changes around .
- truncation : At greater than a certain order (k) After that, it tends to 0 by k Order truncation , It can be simply understood as directly changing from a certain order to 0.
On the simple interpretation of tailing and truncation , If the value does not approach 0, Then it's Tailing ; If the value quickly approaches 0, Is truncated .
combination ACF and PACF chart ,SPSSAU Automatic recognition , Finally, the autoregressive order is suggested p The value is 3, Moving average order q The value is 3. Combined with the differential phase determined in the previous step d, So far, we have determined p、d、q The order of the three parameters , Then you can establish ARIMA Model .
(5)ARIMA prediction model
Operation steps :
① choice SPSSAU【 Econometric research 】-【ARIMA forecast 】
② In the time series box , Put in “Ln_ Per capita health costs ”. Differential order selection “ Second order ”, Autoregressive order p choice “3”, Moving average order q choice “3”, Click start analysis .
Result analysis :
ARIMA The model results are output 3 A form , The first 1 The first table is the table of fitting model parameters , Show the results of model construction, including the value of regression coefficient ,p It's worth waiting for ; The first 2 A table is a model Q Statistics table , Used to check whether the residual is white noise ; The first 3 The first table is the predicted value of the model , Provide future 12 The model prediction value of the period .
First table :ARIMA(3,2,3) Model parameter table
The first table shows the results of this model construction , Including model parameters 、 Information criteria and other indicators , It usually doesn't need too much attention .AIC and BIC Value is used for multiple analysis model comparisons ; The lower these two values, the better , If the analysis is carried out many times , The changes of these two values can be compared , Comprehensively explain the optimization process of model construction .SPSSAU It will automatically model and compare multiple potential alternative models .
Second form : Model Q Statistics table
AIRMA After the model is built, it is generally required that the residual of the model is white noise , That is, there is no autocorrelation in the residuals , The second table shows through Q The result of white noise test by statistical test ( The null hypothesis : The residual is white noise ).
such as Q6 Before checking the residual 6 Whether the order autocorrelation coefficient meets the white noise , Usually it corresponds to p Greater than 0.1 It means that it meets the white noise test ( Otherwise, it means that it is not white noise ), In common cases, it can be directly targeted at Q6 Can be analyzed .
from Q Statistics results ,Q6 Of p Greater than 0.1, It's in 0.1 The original hypothesis cannot be rejected under the significance level of , The residual of the model is white noise , The model basically meets the requirements .
Third form : Forecast value table
The third table is the predicted value table , ARIMA The purpose of the model is to predict the future data , Therefore, this table is also the most concerned by researchers . One of them is backward 1 period , Represents the 2007 The forecast for the year ; backward 2 Period represents 2008 The forecast for the year , And so on .
Because we have done logarithmic processing on the data before , So the prediction value shown here is not the final prediction result , You also need to restore the predicted value results .( If you have not done logarithmic processing, you can ignore this step )
Finally, we can see by comparing the predicted value with the actual value , adopt ARIMA The model predicts , forecast 2007 The per capita health expenditure in China should be 848.93 element , And the real value 875.96 Yuan is basically close to , It shows that the prediction result of the model is reliable .
From experience ,ARIMA What the model expects backwards 1 Period and backward 2 The period is relatively reliable , If the number to the later stage is too much , The accuracy will be low .
In particular :SPSSAU You can intelligently find the best by default ARIMA Model and predict , The principle of intelligent fitting model is to use AIC The smaller the value, the better , Compare and select the best model from many potential models . If researchers want to pass SPSSAU Automatically identify the best ARIMA Model , Direct use 【 Econometric research 】 Medium 【ARIMA Model 】 Can be analyzed .
The above is the content of this sharing , Sign in SPSSAU Official website Learn more :
边栏推荐
- Basic usage of shell script
- PTA year of birth
- Apache iceberg source code analysis: schema evolution
- Talk about the favorite tools used by project managers
- 12种数据量纲化处理方式
- ArrayList扩容详解
- Definition of rotation axis in mujoco
- Debiasing word embeddings | talking about word embedding and deviation removal # yyds dry goods inventory #
- Nearly 60% of the employees strongly support Ctrip's "3+2" working mode, and work at home for two days a week
- 主成分计算权重
猜你喜欢
每周推荐短视频:警惕“现象”与“问题”相互混淆
LeetCode 148. Sort linked list
Cloud picture says | distributed transaction management DTM: the little helper behind "buy buy buy"
Sum of three numbers
C# SelfHost WebAPI (2)
Yuancosmos game farmersworld farmers world - core content of the second conference in China!
1、《创建您自己的NFT集合并发布一个Web3应用程序来展示它们》什么是NFT
Extract the compressed package file and retrieve the password
2022 Heilongjiang latest fire protection facility operator simulation test question bank and answers
Find all missing numbers in the array
随机推荐
PTA year of birth
MFC obtains local IP (used more in network communication)
APK签名流程介绍[通俗易懂]
Penetration practice vulnhub range Nemesis
Quick foundation of group theory (5): generators, Kelley graphs, orbits, cyclic graphs, and "dimensions" of groups?
Fix the black screen caused by iPhone system failure
Easycvr accesses the equipment through the national standard gb28181 protocol. What is the reason for the automatic streaming of the equipment?
Extract the compressed package file and retrieve the password
What impact will multinational encryption regulation bring to the market in 2022
Blue Bridge Cup real topic: the shortest circuit
每周推荐短视频:警惕“现象”与“问题”相互混淆
Search 2D matrix 2
Samba basic usage
Batch export all pictures in PPT in one second
Redis master-slave realizes 10 second check and recovery
Nearly 60% of the employees strongly support Ctrip's "3+2" working mode, and work at home for two days a week
[acnoi2022] color ball
540. Single element in ordered array / 1684 Count the number of consistent strings
传感器尺寸、像素、DPI分辨率、英寸、毫米的关系
Step size of ode45 and reltol abstol