当前位置：网站首页>[secretly kill little partner pytorch20 days] - [day4] - [example of time series data modeling process]

[secretly kill little partner pytorch20 days] - [day4] - [example of time series data modeling process]

2022-06-30 05:43:00 【aJupyter】

System tutorial 20 Heaven takes Pytorch
Recently with Brother Zhong 、 Huige Do a little punch in ,20 God pytorch, This is the first 4 God . Welcome to one button and three links .

2020 The outbreak of the novel coronavirus pneumonia in 2008 has caused many aspects of the lives of people of various countries. .

Some students are on income , Some students are emotional , Some students are psychological , There are also students who are overweight .

This paper is based on China 2020 year 3 Epidemic data before June , Establish time series RNN Model , China's novel coronavirus pneumonia outbreak is expected to end. .

import os
import datetime
import torchkeras

# Print time 
def printbar():
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    print("\n"+"=========="*8 + "%s"%nowtime)

#mac On the system pytorch and matplotlib stay jupyter You need to change the environment variable when running in 
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

One , Prepare the data

The data set of this paper is taken from tushare
Data set Overview
Insert picture description here

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

df = pd.read_csv("/home/mw/input/data6936/eat_pytorch_data/data/covid-19.csv",sep = "\t")
df.plot(x = "date",y = ["confirmed_num","cured_num","dead_num"],figsize=(10,6))
plt.xticks(rotation=60) #  Abscissa rotation 60°

Insert picture description here

dfdata = df.set_index("date")
dfdiff = dfdata.diff(periods=1).dropna()  #  The null value is deleted after the first-order difference , The null value is actually the first line 
dfdiff = dfdiff.reset_index("date") #  Cancel date Index identity 

dfdiff.plot(x = "date",y = ["confirmed_num","cured_num","dead_num"],figsize=(10,6))
plt.xticks(rotation=60)
dfdiff = dfdiff.drop("date",axis = 1).astype("float32") #  Delete time column , And convert to floating point

Insert picture description here

tips:
Insert picture description here

df = pd.DataFrame({
    'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]})


# Set a single column as an index 
df.set_index('month')
''' year sale month 1 2012 55 4 2014 40 7 2013 84 10 2014 31 '''

Insert picture description here

Now let's inherit torch.utils.data.Dataset Implement custom time series data set .

torch.utils.data.Dataset Is an abstract class , Users who want to load custom data only need to inherit this class , And override two of them ：

len: Realization len(dataset) Returns the size of the entire dataset .
getitem: Used to get some index data , send dataset[i] Returns the... In the dataset i Samples .
Not overriding these two methods will directly return an error .

import torch 
from torch import nn 
from torch.utils.data import Dataset,DataLoader,TensorDataset


# Use a day ago 8 The day window data is used as the input to predict the data of the day 
WINDOW_SIZE = 8

class Covid19Dataset(Dataset):
        
    def __len__(self):
        return len(dfdiff) - WINDOW_SIZE
    
    def __getitem__(self,i):
        x = dfdiff.loc[i:i+WINDOW_SIZE-1,:]
        feature = torch.tensor(x.values)
        y = dfdiff.loc[i+WINDOW_SIZE,:]
        label = torch.tensor(y.values)
        return (feature,label)
    
ds_train = Covid19Dataset()

# The data is small , You can put all the training data into one batch in , Lifting performance 
dl_train = DataLoader(ds_train,batch_size = 38)

import torch 
from torch import nn 
from torch.utils.data import Dataset,DataLoader,TensorDataset


# Use a day ago 8 The day window data is used as the input to predict the data of the day 
WINDOW_SIZE = 8

class Covid19Dataset(Dataset):
        
    def __len__(self):
        return len(dfdiff) - WINDOW_SIZE
    
    def __getitem__(self,i):
        x = dfdiff.loc[i:i+WINDOW_SIZE-1,:]
        feature = torch.tensor(x.values)
        y = dfdiff.loc[i+WINDOW_SIZE,:]
        label = torch.tensor(y.values)
        return (feature,label)
    
ds_train = Covid19Dataset()

# The data is small , You can put all the training data into one batch in , Lifting performance 
dl_train = DataLoader(ds_train,batch_size = 38)

Data processing summary

time series data , It is to use the data of the previous time to predict the data of the later time
Perform first-order difference on the data , Then remove the NaN value , structure dataset（ Use the data of the first eight days as the training set ）

Two 、 Defining models

Use Pytorch There are usually three ways to build models ：

Use nn.Sequential Build models in a hierarchical order
Inherit nn.Module Base classes build custom models
Inherit nn.Module The base class builds the model and assists in encapsulating the model container .

Choose the second way to build the model here .

Because the next training cycle in the form of class , We further encapsulate the model into torchkeras Medium Model Class to get something like Keras Functions of medium and high-order model interface .

Model Class actually inherits from nn.Module class .

import torch
from torch import nn 
import importlib 
import torchkeras 

torch.random.seed()

class Block(nn.Module):
    def __init__(self):
        super(Block,self).__init__()
    
    def forward(self,x,x_input):
        x_out = torch.max((1+x)*x_input[:,-1,:],torch.tensor(0.0))
        return x_out
    
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 3 layer lstm
        self.lstm = nn.LSTM(input_size = 3,hidden_size = 3,num_layers = 5,batch_first = True)
        self.linear = nn.Linear(3,3)
        self.block = Block()
        
    def forward(self,x_input):
        x = self.lstm(x_input)[0][:,-1,:] #  Do not use the length dimension of the sequence 
        x = self.linear(x)
        y = self.block(x,x_input)
        return y
        
net = Net()
model = torchkeras.Model(net) #  Devil details 
print(model)

model.summary(input_shape=(8,3),input_dtype = torch.FloatTensor)

Insert picture description here

3、 ... and 、 Training models

Training Pytorch It usually requires the user to write a custom training cycle , The code style of the training cycle varies from person to person .

Yes 3 Class typical training cycle code style ： Script form training cycle , Function form training cycle , Class form training cycle .

Here is a form of training cycle .

We imitate Keras A high-order model interface is defined Model, Realization fit, validate,predict, summary Method , It is equivalent to user-defined high-level API.

notes ： It is difficult to debug the cyclic neural network , You need to set multiple different learning rates and try many times , To achieve better results .

def mspe(y_pred,y_true):
    err_percent = (y_true - y_pred)**2/(torch.max(y_true**2,torch.tensor(1e-7)))
    return torch.mean(err_percent)

model.compile(loss_func = mspe,optimizer = torch.optim.Adagrad(model.parameters(),lr = 0.1))

def mspe(y_pred,y_true):
    err_percent = (y_true - y_pred)**2/(torch.max(y_true**2,torch.tensor(1e-7)))
    return torch.mean(err_percent)

model.compile(loss_func = mspe,optimizer = torch.optim.Adagrad(model.parameters(),lr = 0.1))

dfhistory = model.fit(100,dl_train,log_step_freq=10)

Four 、 Evaluation model

Generally, validation set or test set should be set for evaluation model , Because there is less data in this case , We only visualize the iteration of the loss function on the training set .

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import matplotlib.pyplot as plt

def plot_metric(dfhistory, metric):
    train_metrics = dfhistory[metric]
    epochs = range(1, len(train_metrics) + 1)
    plt.plot(epochs, train_metrics, 'bo--')
    plt.title('Training '+ metric)
    plt.xlabel("Epochs")
    plt.ylabel(metric)
    plt.legend(["train_"+metric])
    plt.show()

plot_metric(dfhistory,"loss")

Insert picture description here

5、 ... and 、 Using the model

Here we use the model to predict the end time of the epidemic , namely The newly confirmed cases are 0 Time for .

# Use dfresult Record the existing data and the epidemic data predicted later 
dfresult = dfdiff[["confirmed_num","cured_num","dead_num"]].copy()
dfresult.tail()

Insert picture description here

# After prediction 500 The new trend of days , Add its results to dfresult in 
for i in range(500):
    arr_input = torch.unsqueeze(torch.from_numpy(dfresult.values[-38:,:]),axis=0)
    arr_predict = model.forward(arr_input)

    dfpredict = pd.DataFrame(torch.floor(arr_predict).data.numpy(),
                columns = dfresult.columns)
    dfresult = dfresult.append(dfpredict,ignore_index=True)

tips:

torch.unsqueeze(torch.from_numpy(dfresult.values[-38:,:]),axis=0) In the 0 Add one dimension to the dimension
torch.floor Rounding down

dfresult.query("confirmed_num==0").head()

#  The first 50 The new diagnosis was reduced to 0, The first 45 Day correspondence 3 month 10 Japan , That is to say 5 Days later , It is expected that 3 month 15 The new diagnosis was reduced to 0
#  notes ： The forecast is optimistic

Insert picture description here

dfresult.query("cured_num==0").head()

#  The first 186 The new healing is reduced to 0, That is, about 1 After year .
#  notes :  The forecast is pessimistic , And there are problems , If you add up the number of people newly cured every day , Will exceed the cumulative number of confirmed cases .

Insert picture description here

6、 ... and 、 Save the model

#  Save model parameters 

torch.save(model.net.state_dict(), "./data/model_parameter.pkl")

net_clone = Net()
net_clone.load_state_dict(torch.load("./data/model_parameter.pkl"))
model_clone = torchkeras.Model(net_clone)
model_clone.compile(loss_func = mspe)

#  Evaluation model 
model_clone.evaluate(dl_train)

tips

Here's a devil detail ,
net_clone = Net()
net_clone.load_state_dict(torch.load("./data/model_parameter.pkl"))
model_clone = torchkeras.Model(net_clone)
You can't reverse the order , Otherwise, the report will be wrong , In fact, there is no need to torchkeras Same training save

summary

Data preprocessing is nothing , Just before use 8 One day's data predicts the next day's data
When the model is built LSTM The input and output of is very important
Use torchkeras Pay attention to the model loading sequence