当前位置：网站首页>Deep reinforcement learning for intelligent transportation systems: a survey paper reading notes

Deep reinforcement learning for intelligent transportation systems: a survey paper reading notes

2022-07-03 02:43:00 【strawberry47】

This is an overview of the field of intelligent transportation , Focus on explaining how to use reinforcement learning to solve traffic signal lamp control RL+TSC ;Traffic Signal Control ： Traffic signal lamp control ,

Catalog

One . Overview

classification ：
AI based transportation applications:
① management applications,
② public transportation,
③ autonomous vehicles

This part also introduces a lot RL Basic concepts of , Target network 、 Experience playback and so on , Are common knowledge points in the field of reinforcement learning , You can read my other notes ~
Insert picture description here
Traffic signal lamp control ：
state： Team length 、 Vehicle location 、 Vehicle speed
The goal is ： Minimize congestion at intersections

Two . Traffic signal control is represented as Deep RL problem

2.1 state：

RGB Images , combination DQN; Snapshot of intersection （ Speed and position ）
Image-like representation/discrete traffic state encoding (DTSE); advantage ： Contains a wealth of information , Speed 、 Location 、 Signal lamp 、 The acceleration
feature-based value vector, Vector representation ; Such as ： Team length 、 Accumulated waiting time 、 The average waiting time of a lane 、 Signal duration 、 The number of vehicles in a lane
Consider more complete road information

Insert picture description here

2.2 action：

It is usually a crossroads , Different directions and durations need to be considered ;
Four green light stages ： North-South Green (NSG) North south direction , East-West Green (EWG) East West traffic , North-South Advance Left Green (NSLG) Turn left in the north-south direction , East-West Advance Left Green (EWLG) Turn left in the east-west direction .

Select green light （ Choose a green light in four directions ）
binary action： Keep the current or Change direction
Update the duration of each phase

Q： Only care about the green light ？
A： Some papers are simplified into two green stages ： North South Green and East West Green , Left turn ignored

2.3 reward：

Waiting time
Cumulative delay
Team length
absolute value of the traffic data （ Traffic data ）

2.4 Neural Network Structure：

MLP
CNN： and DQN combination
RNN： Sequence data
AutoEncoder

2.5 Simulation environment ：

In the early ：Java-based Green Light District (GLD)
popular ：Simulation Urban Mobility (SUMO)
mature ：VISSIM,AIMSUN（ And MATLAB Good interaction ）

3、 ... and . Deep RL Application of in traffic signal control

Insert picture description here

3.1 Standard RL Applications：

3.1.1 Single Agent Applications:

RL-based single intersection
It will be divided into single intersection and multi intersection traffic

reference [57] Take the length of the team as state, The total delay time is taken as reward; It's the first one binary action model; Compare with the scene of fixed time signal lamp
The literature [60] The real intersection scene is proposed for the first time , Three methods are proposed state Definition ... four reward function.
（ This part is equivalent to related work）

3.1.2 Multi-Agent Applications

Cooperate to control multiple intersections

Four standards TSC Algorithm （ It should be commonly used baseline）： Fixed time control 、 stochastic control 、 The longest team is preferred 、 Vehicles have priority at most
Classic algorithm （Wiering Put forward ）：TC-1,TC-2,TC-3

state Configured by traffic lights 、 Vehicle location 、 Composition of vehicle destination , Considering the local and global characteristics （ It's not practical , Because the vehicle information is unknown ）
The purpose is to reduce the waiting time

Follow up work is right Wiering Work improvement ：
① Add congestion information at other intersections
② increase state size（ By adding congestion information ）
③ Increase the blockage coefficient （instead of increasing the state space）
④ Add congestion and unexpected information
⑤ Consider collaborative information
⑥ Multiple goals ：vehicle stops, average waiting time, and maximum queue length are targeted as objectives for low, medium, and high traffic volume Design differently according to different scenes reward function
Khamis The job of ：
① Bayesian transition probability ->reward function
② more specific objectives
③ seven objectives, Combined with the cooperative exploration function
related work：
① Hierarchical reinforcement learning
② R-Markov Average Reward
③ Consider the collaborative information between regions