当前位置:网站首页>Climbing the pit of traffic flow prediction (III): using pytorch to realize LSTM to predict traffic flow
Climbing the pit of traffic flow prediction (III): using pytorch to realize LSTM to predict traffic flow
2022-07-29 04:48:00 【__ Meursault__】
I haven't updated the content for a long time , The last article can be seen as written by someone who just came into contact with deep learning , Very narrow-minded , The content is very rough .
I came into contact with Pytorch, Have to admit , be relative to TensorFlow Speaking of , A lot of flexibility .
Use this time pytroch Let's make a traffic flow prediction , The data is the same as that in the previous article .
Baidu SkyDrive : https://pan.baidu.com/s/19vKN2eZZPbOg36YEWts4aQ
password 4uh7
Get ready to start
Catalog
Load data
When importing data , Need to put Network Delete this column 
f = pd.read_csv('..\Desktop\AE86.csv')
# Set column labels again
def set_columns():
columns = []
for i in f.loc[2]:
columns.append(i.strip())
return columns
f.columns = set_columns()
f.drop([0,1,2], inplace = True)
# Reading data
data = f['Total Carriageway Flow'].astype(np.float64).values[:, np.newaxis]
data.shape # (2880, 1)
Construct data set
pytorch about Batch Input , Provides a standardized module for dividing data , Dataset and DataLoader, These two are just used by many pytorch My friend has a headache , I won't talk too much about it simply. You can go to the official website or other blogs to have a look
Dataset It's equivalent to a big box for things , For example, the box containing apples , There can be many small boxes in a big box , There are apples in the small box , The number of apples in small boxes in different large boxes can be different
Dataloader It is equivalent to using what method to take the box in the big box , such as , You can take it one time in order 10 A small box or take it randomly 5 A small box
But this use Dataset and DataLoader Creating datasets is not the only way , For example, you can also use Last article The method used in forms data directly
Of course , Since it is used pytorch Then use its features , The following is based on the above Dataset Methods , We can construct our own data set
Define your own Dataset More trouble , Dataset It must include two parts :len and getitem
len Used to generate statistics Dataset The length of
getitem Yes. index To get the corresponding data
At the same time, for the convenience of , this Dataset in , The process of data standardization and de standardization is added
structure Dataset
class LoadData(Dataset):
def __init__(self, data, time_step, divide_days, train_mode):
self.train_mode = train_mode
self.time_step = time_step
self.train_days = divide_days[0]
self.test_days = divide_days[1]
self.one_day_length = int(24 * 4)
# flow_norm (max_data. min_data)
self.flow_norm, self.flow_data = LoadData.pre_process_data(data)
# No standardization
# self.flow_data = data
def __len__(self, ):
if self.train_mode == "train":
return self.train_days * self.one_day_length - self.time_step
elif self.train_mode == "test":
return self.test_days * self.one_day_length
else:
raise ValueError(" train mode error")
def __getitem__(self, index):
if self.train_mode == "train":
index = index
elif self.train_mode == "test":
index += self.train_days * self.one_day_length
else:
raise ValueError(' train mode error')
data_x, data_y = LoadData.slice_data(self.flow_data, self.time_step, index,
self.train_mode)
data_x = LoadData.to_tensor(data_x)
data_y = LoadData.to_tensor(data_y)
return {"flow_x": data_x, "flow_y": data_y}
# This step is to divide the key parts of the data
@staticmethod
def slice_data(data, time_step, index, train_mode):
if train_mode == "train":
start_index = index
end_index = index + time_step
elif train_mode == "test":
start_index = index - time_step
end_index = index
else:
raise ValueError("train mode error")
data_x = data[start_index: end_index, :]
data_y = data[end_index]
return data_x, data_y
# Data and processing
@staticmethod
def pre_process_data(data, ):
# data N T D
norm_base = LoadData.normalized_base(data)
normalized_data = LoadData.normalized_data(data, norm_base[0], norm_base[1])
return norm_base, normalized_data
# Generate the maximum and minimum values in the original data
@staticmethod
def normalized_base(data):
max_data = np.max(data, keepdims=True) #keepdims Keep dimensions the same
min_data = np.min(data, keepdims=True)
# max_data.shape --->(1, 1)
return max_data, min_data
# Standardize the data
@staticmethod
def normalized_data(data, max_data, min_data):
data_base = max_data - min_data
normalized_data = (data - min_data) / data_base
return normalized_data
@staticmethod
# Anti standardization In the use of evaluation index error and drawing
def recoverd_data(data, max_data, min_data):
data_base = max_data - min_data
recoverd_data = data * data_base - min_data
return recoverd_data
@staticmethod
def to_tensor(data):
return torch.tensor(data, dtype=torch.float)
Generate training data
# Before using 25 Days as a training set , after 5 Days as forecast set
divide_days = [25, 5]
time_step = 5 # Time step
batch_size = 48
train_data = LoadData(data, time_step, divide_days, "train")
test_data = LoadData(data, time_step, divide_days, "test")
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True, )
test_loader = DataLoader(test_data,batch_size=batch_size, shuffle=False, )
stay test In the data , Don't use shuffle randomization , have access to train_data[0] to glance at , Return the data of the first time step , And time steps +1 Value , But this is the normalized data , stay init Function can not be used pre_process_data function , In this way, check whether the data is correct
Do not normalize the data of the first time step
Need to say ,DataLoader It's a item Data type of , It needs to be taken out circularly
structure LSTM Model
pytorch More flexible , You can clearly see the operation process inside the model , Use pytorch Build the model , It generally consists of two parts : init and forward
init Define the model structure to be used
forward Calculate
# LSTM To build the network
class LSTM(nn.Module):
def __init__(self, input_num, hid_num, layers_num, out_num, batch_first=True):
super().__init__()
self.l1 = nn.LSTM(
input_size = input_num,
hidden_size = hid_num,
num_layers = layers_num,
batch_first = batch_first
)
self.out = nn.Linear(hid_num, out_num)
def forward(self, data):
flow_x = data['flow_x'] # B * T * D
l_out, (h_n, c_n) = self.l1(flow_x, None) # None For the first time hidden_state yes 0
print(l_out[:, -1, :].shape)
out = self.out(l_out[:, -1, :])
return out
pytorch Medium LSTM And TensorFlow The difference is ,pytorch Medium LSTM You can define multiple layers at once , You don't need to stack all the time LSTM layer , And every time LSTM Return the value of three parts : Output of all layers (l_out)、 Hidden state (l_h) And cellular state (c_n).
l_out Is a collection of each variable l_h Output , therefore return When , Can be in l_c Take the last value in the slice , Of course, it can also be used directly l_h
pytorch You can also use Sequential, If you want to use Seqential You need to modify the above Dataset, because Dataset The use return of the definition is a dictionary {“flow_x”, “flow_y”}, If you use Sequential Only "flow_x". Or like tensorflow equally , Don't use Dataset Build a data , Divide directly through a series of methods ( Same as last article ).
Define models and loss functions
input_num = 1 # The characteristic dimension of the input
hid_num = 50 # Number of hidden layers
layers_num = 3 # LSTM Number of layers
# out_num = 1
lstm = LSTM(input_num, hid_num, layers_num, out_num)
loss_func = nn.MSELoss()
optimizer = torch.optim.Adam(lstm.parameters())
Model to predict
pytorch China is not like China tensorflow So easy to describe the loss value , So the output needs to be represented by itself , You can also use it Tensorboard To see the change of loss value , But it's troublesome for beginners
lstm.train()
epoch_loss_change = []
for epoch in range(30):
epoch_loss = 0.0
start_time = time.time()
for data_ in train_loader:
lstm.zero_grad()
predict = lstm(data_)
loss = loss_func(predict, data_['flow_y'])
epoch_loss += loss.item()
loss.backward()
optimizer.step()
epoch_loss_change.append(1000 * epoch_loss / len(train_data))
end_time = time.time()
print("Epoch: {:04d}, Loss: {:02.4f}, Time: {:02.2f} mins".format(epoch, 1000 * epoch_loss / len(train_data),
(end_time-start_time)/60))
plt.plot(epoch_loss_change)
Model evaluation
lstm.eval()
with torch.no_grad(): # Turn off gradient
total_loss = 0.0
pre_flow = np.zeros([batch_size, 1]) # [B, D],T=1 # Dimension of target data , use 0 fill
real_flow = np.zeros_like(pre_flow)
for data_ in test_loader:
pre_value = lstm(data_)
loss = loss_func(pre_value, data_['flow_y'])
total_loss += loss.item()
# Anti normalization
pre_value = LoadData.recoverd_data(pre_value.detach().numpy(),
test_data.flow_norm[0].squeeze(1), # max_data
test_data.flow_norm[1].squeeze(1), # min_data
)
target_value = LoadData.recoverd_data(data_['flow_y'].detach().numpy(),
test_data.flow_norm[0].squeeze(1),
test_data.flow_norm[1].squeeze(1),
)
pre_flow = np.concatenate([pre_flow, pre_value])
real_flow = np.concatenate([real_flow, target_value])
pre_flow = pre_flow[batch_size: ]
real_flow = real_flow[batch_size: ]
# # Calculation error
mse = mean_squared_error(pre_flow, real_flow)
rmse = math.sqrt(mean_squared_error(pre_flow, real_flow))
mae = mean_absolute_error(pre_flow, real_flow)
print(' Mean square error ---', mse)
print(' Root mean square error ---', rmse)
print(' Mean absolute error --', mae)
# Draw the prediction results
font_set = FontProperties(fname=r"C:\Windows\Fonts\simsun.ttc", size=15) # The Chinese font is in Song Dynasty ,15 Number
plt.figure(figsize=(15,10))
plt.plot(real_flow, label='Real_Flow', color='r', )
plt.plot(pre_flow, label='Pre_Flow')
plt.xlabel(' Test sequence ', fontproperties=font_set)
plt.ylabel(' traffic flow / car ', fontproperties=font_set)
plt.legend()
# Predict stored pictures
# plt.savefig('...\Desktop\123.jpg')
Complete code
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as Data
from torchsummary import summary
import math
from torch.utils.data import DataLoader
from torch.utils.data import Dataset
import time
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math
from matplotlib.font_manager import FontProperties # You can use Chinese when drawing
# Load data
f = pd.read_csv('..\Desktop\AE86.csv')
# Set column labels again
def set_columns():
columns = []
for i in f.loc[2]:
columns.append(i.strip())
return columns
f.columns = set_columns()
f.drop([0,1,2], inplace = True)
# Reading data
data = f['Total Carriageway Flow'].astype(np.float64).values[:, np.newaxis]
class LoadData(Dataset):
def __init__(self, data, time_step, divide_days, train_mode):
self.train_mode = train_mode
self.time_step = time_step
self.train_days = divide_days[0]
self.test_days = divide_days[1]
self.one_day_length = int(24 * 4)
# flow_norm (max_data. min_data)
self.flow_norm, self.flow_data = LoadData.pre_process_data(data)
# No standardization
# self.flow_data = data
def __len__(self, ):
if self.train_mode == "train":
return self.train_days * self.one_day_length - self.time_step
elif self.train_mode == "test":
return self.test_days * self.one_day_length
else:
raise ValueError(" train mode error")
def __getitem__(self, index):
if self.train_mode == "train":
index = index
elif self.train_mode == "test":
index += self.train_days * self.one_day_length
else:
raise ValueError(' train mode error')
data_x, data_y = LoadData.slice_data(self.flow_data, self.time_step, index,
self.train_mode)
data_x = LoadData.to_tensor(data_x)
data_y = LoadData.to_tensor(data_y)
return {"flow_x": data_x, "flow_y": data_y}
# This step is to divide the data
@staticmethod
def slice_data(data, time_step, index, train_mode):
if train_mode == "train":
start_index = index
end_index = index + time_step
elif train_mode == "test":
start_index = index - time_step
end_index = index
else:
raise ValueError("train mode error")
data_x = data[start_index: end_index, :]
data_y = data[end_index]
return data_x, data_y
# Data and processing
@staticmethod
def pre_process_data(data, ):
norm_base = LoadData.normalized_base(data)
normalized_data = LoadData.normalized_data(data, norm_base[0], norm_base[1])
return norm_base, normalized_data
# Generate the maximum and minimum values in the original data
@staticmethod
def normalized_base(data):
max_data = np.max(data, keepdims=True) #keepdims Keep dimensions the same
min_data = np.min(data, keepdims=True)
# max_data.shape --->(1, 1)
return max_data, min_data
# Standardize the data
@staticmethod
def normalized_data(data, max_data, min_data):
data_base = max_data - min_data
normalized_data = (data - min_data) / data_base
return normalized_data
@staticmethod
# Anti standardization In the use of evaluation index error and drawing
def recoverd_data(data, max_data, min_data):
data_base = max_data - min_data
recoverd_data = data * data_base - min_data
return recoverd_data
@staticmethod
def to_tensor(data):
return torch.tensor(data, dtype=torch.float)
# Divide the data
divide_days = [25, 5]
time_step = 5
batch_size = 48
train_data = LoadData(data, time_step, divide_days, "train")
test_data = LoadData(data, time_step, divide_days, "test")
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True, )
test_loader = DataLoader(test_data,batch_size=batch_size, shuffle=False, )
# LSTM To build the network
class LSTM(nn.Module):
def __init__(self, input_num, hid_num, layers_num, out_num, batch_first=True):
super().__init__()
self.l1 = nn.LSTM(
input_size = input_num,
hidden_size = hid_num,
num_layers = layers_num,
batch_first = batch_first
)
self.out = nn.Linear(hid_num, out_num)
def forward(self, data):
flow_x = data['flow_x'] # B * T * D
l_out, (h_n, c_n) = self.l1(flow_x, None) # None For the first time hidden_state yes 0
# print(l_out[:, -1, :].shape)
out = self.out(l_out[:, -1, :])
return out
# Define model parameters
input_num = 1 # The characteristic dimension of the input
hid_num = 50 # Number of hidden layers
layers_num = 3 # LSTM Number of layers
out_num = 1
lstm = LSTM(input_num, hid_num, layers_num, out_num)
loss_func = nn.MSELoss()
optimizer = torch.optim.Adam(lstm.parameters())
# Training models
lstm.train()
epoch_loss_change = []
for epoch in range(30):
epoch_loss = 0.0
start_time = time.time()
for data_ in train_loader:
lstm.zero_grad()
predict = lstm(data_)
loss = loss_func(predict, data_['flow_y'])
epoch_loss += loss.item()
loss.backward()
optimizer.step()
epoch_loss_change.append(1000 * epoch_loss / len(train_data))
end_time = time.time()
print("Epoch: {:04d}, Loss: {:02.4f}, Time: {:02.2f} mins".format(epoch, 1000 * epoch_loss / len(train_data),
(end_time-start_time)/60))
plt.plot(epoch_loss_change)
# Evaluation model
lstm.eval()
with torch.no_grad(): # Turn off gradient
total_loss = 0.0
pre_flow = np.zeros([batch_size, 1]) # [B, D],T=1 # Dimension of target data , use 0 fill
real_flow = np.zeros_like(pre_flow)
for data_ in test_loader:
pre_value = lstm(data_)
loss = loss_func(pre_value, data_['flow_y'])
total_loss += loss.item()
# Anti normalization
pre_value = LoadData.recoverd_data(pre_value.detach().numpy(),
test_data.flow_norm[0].squeeze(1), # max_data
test_data.flow_norm[1].squeeze(1), # min_data
)
target_value = LoadData.recoverd_data(data_['flow_y'].detach().numpy(),
test_data.flow_norm[0].squeeze(1),
test_data.flow_norm[1].squeeze(1),
)
pre_flow = np.concatenate([pre_flow, pre_value])
real_flow = np.concatenate([real_flow, target_value])
pre_flow = pre_flow[batch_size: ]
real_flow = real_flow[batch_size: ]
# # Calculation error
mse = mean_squared_error(pre_flow, real_flow)
rmse = math.sqrt(mean_squared_error(pre_flow, real_flow))
mae = mean_absolute_error(pre_flow, real_flow)
print(' Mean square error ---', mse)
print(' Root mean square error ---', rmse)
print(' Mean absolute error --', mae)
# Draw the prediction results
font_set = FontProperties(fname=r"C:\Windows\Fonts\simsun.ttc", size=15) # The Chinese font is in Song Dynasty ,15 Number
plt.figure(figsize=(15,10))
plt.plot(real_flow, label='Real_Flow', color='r', )
plt.plot(pre_flow, label='Pre_Flow')
plt.xlabel(' Test sequence ', fontproperties=font_set)
plt.ylabel(' traffic flow / car ', fontproperties=font_set)
plt.legend()
# Predict stored pictures
# plt.savefig('...\Desktop\123.jpg')
``
边栏推荐
- Leetcode (Sword finger offer) - 53 - I. find the number I in the sorted array
- def fasterrcnn_resnet50_fpn()实例测试
- Build auto.js script development environment
- Flink+Iceberg环境搭建及生产问题处理
- [C language] PTA 7-52 finding the sum of the first n terms of a simple interleaved sequence
- Implementation of img responsive pictures (including the usage of srcset attribute and sizes attribute, and detailed explanation of device pixel ratio)
- Review key points and data sorting of information metrology in the second semester of 2022 (teacher zhaorongying of Wuhan University)
- Makefile(make)常见规则(二)
- Classes and objects (III)
- iOS面试准备 - 其他篇
猜你喜欢

Flutter 手势监听和画板实现

带你一文理解JS数组

After the spinning up installation is completed, use the tutorial to test whether it is successful. There are library "Glu" not found and 'from pyglet.gl import * error solutions

STL source code analysis (Hou Jie) notes - STL overview

Hengxing Ketong invites you to the 24th China expressway informatization conference and technical product exhibition in Hunan

Pycharm reports an error when connecting to the virtual machine database

Unity基础(3)—— unity中的各种坐标系

Classes and objects (III)

SGuard64.exe ACE-Guard Client EXE:造成磁盘经常读写,游戏卡顿,及解决方案

央企建筑企业数字化转型核心特征是什么?
随机推荐
Star a pathfinding in LAYA
Data Lake: spark, a distributed open source processing engine
STL source code analysis (Hou Jie) notes -- Classification and testing of stl containers
Leetcode 763. partition labels divide alphabetic intervals (medium)
LeetCode(剑指 Offer)- 53 - I. 在排序数组中查找数字 I
EF Core: 一对一,多对多的配置
Pyscript cannot import package
How to avoid damage of oscilloscope current probe
Idea small settings
网络之以太网
Sguard64.exe ace guard client exe: frequent disk reading and writing, game jamming, and Solutions
Basic grammar of C language
Mujoco and mujoco_ Install libxcursor.so 1:NO such dictionary
钉钉对话框文子转换成图片 不能复制粘贴到文档上
如何避免示波器电流探头损坏
Go面向并发的内存模型
带你一文理解JS数组
After the spinning up installation is completed, use the tutorial to test whether it is successful. There are library "Glu" not found and 'from pyglet.gl import * error solutions
Oracle update and delete data
[c language] PTA 7-49 have fun with numbers (partially correct)


