当前位置:网站首页>Climbing the pit of traffic flow prediction (II): the simplest LSTM predicts traffic flow using tensorflow2
Climbing the pit of traffic flow prediction (II): the simplest LSTM predicts traffic flow using tensorflow2
2022-07-29 04:48:00 【__ Meursault__】
Speaking of time series prediction , I think I must first think of RNN, Then think of LSTM,LSTM Let's not talk about the principle , There are many related articles on the Internet .
Use tensorflow2.0 To achieve the prediction
Have to say tensorflow2.0 It's so delicious , Too simple. , If you really have hands
stay tensorflow You only need to call already tensorflow Of LSTM Just modules , Consider the following code
from tensorflow.keras.layers import Dense,LSTM,Dropout
model = tf.keras.Sequential([
LSTM(80, return_sequences=True),
Dropout(0.2),
LSTM(80),
Dropout(0.2),
Dense(1)
])
model.compile(optimizer='adam',
loss='mse',)
This creates a 2 layer LSTM, Each layer 80 One neuron ; At the same time, added Droopout Function to prevent overfitting ; Use adam Activation function ; Use mse Neural network as loss error . Really fried chicken is simple .
The main problem is data processing , To do time series prediction , The principle should be before use n It's time to predict the next time , That is, the data trained by the model should be data like the following figure
So processing data is difficult .
The data I use below is in Last article UK site data mentioned in . Other data are similar .
Baidu SkyDrive : https://pan.baidu.com/s/19vKN2eZZPbOg36YEWts4aQ
password 4uh7
When importing data , I don't know why if there is a red column , It will prompt the error , So I deleted this data directly , This column of data has no impact on the forecast
Then through the following code, you can get a containing , date 、 Data of traffic
f = pd.read_csv('..\Desktop\AE86.csv')
# Set column labels again
def set_columns():
columns = []
for i in f.loc[2]:
columns.append(i.strip())
return columns
f.columns = set_columns()
f.drop([0,1,2], inplace = True)
# data Contains the columns to operate on
data = pd.DataFrame()
# Which line of data do you want to leave , Add it here to data in
data['datetime'] = f['Local Date']+' '+f['Local Time']
data['total_flow'] = f['Total Carriageway Flow']
# data['speed'] = f['Speed Value'] Speed is not used in this article
data['datetime'] = pd.to_datetime(data['datetime'])
data['month'] = data['datetime'].apply(lambda date: date.month)
data['day'] = data['datetime'].apply(lambda date: date.day)
data['hour'] = data['datetime'].apply(lambda date:date.hour)
data['minute'] = data['datetime'].apply(lambda date: date.minute)
# Data format
data['total_flow'] = np.array(data['total_flow']).astype(np.float64)
The processed data are as follows
Then it is to divide the training set and the test set , normalization
# The first day of January 25 Index value of the first time of the day
d25 = data.query('day==25').index[0]
# Training set 2211 Data ,2018 The first three weeks of January
train_set = data.iloc[:d25,1:2]
# Detection set 669 Data ,2018 Last week of
test_set = data.iloc[d25:,1:2]
# normalization
sc = MinMaxScaler(feature_range=(0, 1))
train_set_sc = sc.fit_transform(train_set)
test_set_sc = sc.transform(test_set)
Here's how to create LSTM Input data for , With time_step=5 Is the prediction interval , That is, before use 5 Time period , Predict the next time period
time_step = 5
# according to time_step Divided time step
x_train = []
y_train = []
x_test = []
y_test = []
for i in range(time_step, len(train_set_sc)):
x_train.append(train_set_sc[i - time_step:i])
y_train.append(train_set_sc[i:i + 1])
for i in range(time_step, len(test_set_sc)):
x_test.append(test_set_sc[i - time_step:i])
y_test.append(test_set_sc[i:i + 1])
x_test, y_test = np.array(x_test), np.array(y_test)
# randomization , This part can not
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)
# To array Format
x_train, y_train = np.array(x_train), np.array(y_train)
x_test, y_test = np.array(x_test), np.array(y_test)
x_train = np.reshape(x_train, (x_train.shape[0], time_step, 1))
x_test = np.reshape(x_test, (x_test.shape[0], time_step, 1))
The following is building the model , forecast , error analysis , Visualization and so on
The overall code is as follows
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Dense,LSTM,Dropout,Flatten
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math
from matplotlib.font_manager import FontProperties # You can use Chinese when drawing
f = pd.read_csv('..\Desktop\AE86.csv')
# Set column labels again
def set_columns():
columns = []
for i in f.loc[2]:
columns.append(i.strip())
return columns
f.columns = set_columns()
f.drop([0,1,2], inplace = True)
# data Contains the columns to operate on
data = pd.DataFrame()
data['datetime'] = f['Local Date']+' '+f['Local Time']
data['total_flow'] = f['Total Carriageway Flow']
# data['speed'] = f['Speed Value']
data['datetime'] = pd.to_datetime(data['datetime'])
data['month'] = data['datetime'].apply(lambda date: date.month)
data['day'] = data['datetime'].apply(lambda date: date.day)
data['hour'] = data['datetime'].apply(lambda date:date.hour)
data['minute'] = data['datetime'].apply(lambda date: date.minute)
# Data format
data['total_flow'] = np.array(data['total_flow']).astype(np.float64)
# The first day of January 25 Index value of the first time of the day
d25 = data.query('day==25').index[0]
# Training set 2211 Data ,2018 The first three weeks of January
train_set = data.iloc[:d25,1:2]
# Detection set 669 Data ,2018 Last week of
test_set = data.iloc[d25:,1:2]
# normalization
sc = MinMaxScaler(feature_range=(0, 1))
train_set_sc = sc.fit_transform(train_set)
test_set_sc = sc.transform(test_set)
# according to time_step Divided time step
time_step = 5
x_train = []
y_train = []
x_test = []
y_test = []
for i in range(time_step, len(train_set_sc)):
x_train.append(train_set_sc[i - time_step:i])
y_train.append(train_set_sc[i:i + 1])
for i in range(time_step, len(test_set_sc)):
x_test.append(test_set_sc[i - time_step:i])
y_test.append(test_set_sc[i:i + 1])
x_test, y_test = np.array(x_test), np.array(y_test)
# randomization , This part can not
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)
# To array Format
x_train, y_train = np.array(x_train), np.array(y_train)
x_test, y_test = np.array(x_test), np.array(y_test)
x_train = np.reshape(x_train, (x_train.shape[0], time_step, 1))
x_test = np.reshape(x_test, (x_test.shape[0], time_step, 1))
# LSTM Model
model = tf.keras.Sequential([
LSTM(80, return_sequences=True),
Dropout(0.2),
LSTM(80),
Dropout(0.2),
Dense(1)
])
model.compile(optimizer='adam',
loss='mse',)
# Training models , among epochs,batch_size You can change it yourself
history = model.fit(x_train, y_train,
epochs=5,
validation_data=(x_test, y_test))
# Model to predict
pre_flow = model.predict(x_test)
# Anti normalization
pre_flow = sc.inverse_transform(pre_flow)
real_flow = sc.inverse_transform(y_test.reshape(y_test.shape[0], 1))
# Calculation error
mse = mean_squared_error(pre_flow, real_flow)
rmse = math.sqrt(mean_squared_error(pre_flow, real_flow))
mae = mean_absolute_error(pre_flow, real_flow)
print(' Mean square error ---', mse)
print(' Root mean square error ---', rmse)
print(' Mean absolute error --', mae)
# Draw the prediction results
font_set = FontProperties(fname=r"C:\Windows\Fonts\simsun.ttc", size=15) # The Chinese font is in Song Dynasty ,15 Number
plt.figure(figsize=(15,10))
plt.plot(real_flow, label='Real_Flow', color='r', )
plt.plot(pre_flow, label='Pre_Flow')
plt.xlabel(' Test sequence ', fontproperties=font_set)
plt.ylabel(' traffic flow / car ', fontproperties=font_set)
plt.legend()
# Predict stored pictures
# plt.savefig('...\Desktop\123.jpg')
The above code is the simplest , Just use traffic , At the same time, a single node performs traffic prediction .
You can also use speed , Occupancy and other information , Add to the model to predict the flow . It's hard to be serious , But if you just deal with it , Provide a thought :
Several other features can also be treated according to time_step=5, division , Directly into the model , Just add one at the last layer of the model Flatten layer ( Straighten all data into one dimension ), In this way, you can say ” This paper considers , Traffic 、 Speed 、 Lane occupancy and other factors , Compared with previous articles, it has significant improvements “
边栏推荐
- Various configurations when pulsar starts the client (client, producer, consumer)
- Pyqt5 learning pit encounter and pit drainage (1) unable to open designer.exe
- Post export data, return
- Makefile+Make基础知识
- New year's greetings from programmers
- Implementation of img responsive pictures (including the usage of srcset attribute and sizes attribute, and detailed explanation of device pixel ratio)
- Webrtc realizes simple audio and video call function
- def fasterrcnn_ resnet50_ FPN () instance test
- pulsar起client客户端时(client,producer,consumer)各个配置
- Build auto.js script development environment
猜你喜欢
Hengxing Ketong invites you to the 24th China expressway informatization conference and technical product exhibition in Hunan
正确的用户拖拽方式
Vscode one click compilation and debugging
如何避免示波器电流探头损坏
[QT learning notes] * insert pictures in the window
The most complete NLP Chinese and English stop words list in the whole station (including punctuation marks, which can be copied directly)
SSM integration, addition, deletion, modification and query
Dasctf2022.07 empowerment competition
un7.28:redis客户端常用命令。
What are the core features of the digital transformation of state-owned construction enterprises?
随机推荐
DASCTF2022.07赋能赛
删除word文档中的空白页
JVM (heap and stack) memory allocation
After the spinning up installation is completed, use the tutorial to test whether it is successful. There are library "Glu" not found and 'from pyglet.gl import * error solutions
Google browser opens the web page and out of memory appears
[C language] PTA 7-52 finding the sum of the first n terms of a simple interleaved sequence
带你一文理解JS数组
Flink+Iceberg环境搭建及生产问题处理
ios面试准备 - objective-c篇
C language implementation of three chess
Detailed comparison of break and continue functions
Pyscript cannot import package
[express connection to MySQL database]
[C language] PTA 7-47 binary leading zero
Go memory model for concurrency
un7.28:redis客户端常用命令。
Common current limiting methods
(heap sort) heap sort is super detailed, I don't believe you can't (C language code implementation)
Hengxing Ketong invites you to the 24th China expressway informatization conference and technical product exhibition in Hunan
Mongo shell interactive command window