当前位置:网站首页>Deep reinforcement learning for intelligent transportation systems: a survey paper reading notes
Deep reinforcement learning for intelligent transportation systems: a survey paper reading notes
2022-07-03 02:43:00 【strawberry47】
This is an overview of the field of intelligent transportation , Focus on explaining how to use reinforcement learning to solve traffic signal lamp control RL+TSC ;Traffic Signal Control : Traffic signal lamp control ,
Catalog
One . Overview
- classification :
AI based transportation applications:
① management applications,
② public transportation,
③ autonomous vehicles
This part also introduces a lot RL Basic concepts of , Target network 、 Experience playback and so on , Are common knowledge points in the field of reinforcement learning , You can read my other notes ~
Traffic signal lamp control :
state: Team length 、 Vehicle location 、 Vehicle speed
The goal is : Minimize congestion at intersections
Two . Traffic signal control is represented as Deep RL problem
2.1 state:
- RGB Images , combination DQN; Snapshot of intersection ( Speed and position )
- Image-like representation/discrete traffic state encoding (DTSE); advantage : Contains a wealth of information , Speed 、 Location 、 Signal lamp 、 The acceleration
- feature-based value vector, Vector representation ; Such as : Team length 、 Accumulated waiting time 、 The average waiting time of a lane 、 Signal duration 、 The number of vehicles in a lane
- Consider more complete road information

2.2 action:
It is usually a crossroads , Different directions and durations need to be considered ;
Four green light stages : North-South Green (NSG) North south direction , East-West Green (EWG) East West traffic , North-South Advance Left Green (NSLG) Turn left in the north-south direction , East-West Advance Left Green (EWLG) Turn left in the east-west direction .
- Select green light ( Choose a green light in four directions )
- binary action: Keep the current or Change direction
- Update the duration of each phase
Q: Only care about the green light ?
A: Some papers are simplified into two green stages : North South Green and East West Green , Left turn ignored
2.3 reward:
- Waiting time
- Cumulative delay
- Team length
- absolute value of the traffic data ( Traffic data )
2.4 Neural Network Structure:
- MLP
- CNN: and DQN combination
- RNN: Sequence data
- AutoEncoder
2.5 Simulation environment :
- In the early :Java-based Green Light District (GLD)
- popular :Simulation Urban Mobility (SUMO)
- mature :VISSIM,AIMSUN( And MATLAB Good interaction )
3、 ... and . Deep RL Application of in traffic signal control

3.1 Standard RL Applications:
3.1.1 Single Agent Applications:
RL-based single intersection
It will be divided into single intersection and multi intersection traffic
reference [57] Take the length of the team as state, The total delay time is taken as reward; It's the first one binary action model; Compare with the scene of fixed time signal lamp
The literature [60] The real intersection scene is proposed for the first time , Three methods are proposed state Definition ... four reward function.
( This part is equivalent to related work)
3.1.2 Multi-Agent Applications
Cooperate to control multiple intersections
- Four standards TSC Algorithm ( It should be commonly used baseline): Fixed time control 、 stochastic control 、 The longest team is preferred 、 Vehicles have priority at most
- Classic algorithm (Wiering Put forward ):TC-1,TC-2,TC-3
- state Configured by traffic lights 、 Vehicle location 、 Composition of vehicle destination , Considering the local and global characteristics ( It's not practical , Because the vehicle information is unknown )
- The purpose is to reduce the waiting time
Follow up work is right Wiering Work improvement :
① Add congestion information at other intersections
② increase state size( By adding congestion information )
③ Increase the blockage coefficient (instead of increasing the state space)
④ Add congestion and unexpected information
⑤ Consider collaborative information
⑥ Multiple goals :vehicle stops, average waiting time, and maximum queue length are targeted as objectives for low, medium, and high traffic volume Design differently according to different scenes reward functionKhamis The job of :
① Bayesian transition probability ->reward function
② more specific objectives
③ seven objectives, Combined with the cooperative exploration functionrelated work:
① Hierarchical reinforcement learning
② R-Markov Average Reward
③ Consider the collaborative information between regions
3.2 Deep RL Applications:
3.2.1 Single Agent Applications:
3.2.2 Multi-Agent Deep RL:
Four . DEEP RL FOR OTHER ITS APPLICATIONS
边栏推荐
- JMeter performance test JDBC request (query database to obtain database data) use "suggestions collection"
- Gbase 8C trigger (III)
- 《MATLAB 神经网络43个案例分析》:第43章 神经网络高效编程技巧——基于MATLAB R2012b新版本特性的探讨
- GBase 8c系统表-pg_amop
- Use cve-2021-43893 to delete files on the domain controller
- GBase 8c系统表-pg_collation
- JS的装箱和拆箱
- Add automatic model generation function to hade
- Xiaodi notes
- Matlab tips (24) RBF, GRNN, PNN neural network
猜你喜欢

The use of Flink CDC mongodb and the implementation of Flink SQL parsing complex nested JSON data in monggo
[advanced ROS] Lesson 6 recording and playback in ROS (rosbag)
![[principles of multithreading and high concurrency: 1_cpu multi-level cache model]](/img/7e/ad9ea78868126b149bd9f15f587e6c.jpg)
[principles of multithreading and high concurrency: 1_cpu multi-level cache model]

Tongda OA V12 process center

HW-初始准备

HTB-Devel

How to change the panet layer in yolov5 to bifpn

Restcloud ETL cross database data aggregation operation

超好用的日志库 logzero

Deep learning: multi-layer perceptron and XOR problem (pytoch Implementation)
随机推荐
Use cve-2021-43893 to delete files on the domain controller
random shuffle注意
QT qcombobox add qccheckbox (drop-down list box insert check box, including source code + comments)
【翻译】具有集中控制平面的现代应用负载平衡
The solution of "the required function is not supported" in win10 remote desktop connection is to modify the Registry [easy to understand]
定了,就选它
Unity3d human skin real time rendering real simulated human skin real time rendering "suggestions collection"
ASP. Net core 6 framework unveiling example demonstration [02]: application development based on routing, MVC and grpc
【 tutoriel】 Chrome ferme les cors et les messages de la politique inter - domaines et apporte des cookies à travers les domaines
Gbase 8C system table PG_ am
Oauth2.0 authentication, login and access "/oauth/token", how to get the value of request header authorization (basictoken)???
Gbase 8C system table PG_ auth_ members
[hcia]no.15 communication between VLANs
Your family must be very poor if you fight like this!
[shutter] bottom navigation bar page frame (bottomnavigationbar bottom navigation bar | pageview sliding page | bottom navigation and sliding page associated operation)
Gbase 8C trigger (I)
基于can总线的A2L文件解析(2)
【Flutter】shared_ Preferences local storage (introduction | install the shared_preferences plug-in | use the shared_preferences process)
"Analysis of 43 cases of MATLAB neural network": Chapter 43 efficient programming skills of neural network -- Discussion Based on the characteristics of the new version of MATLAB r2012b
Didi programmers are despised by relatives: an annual salary of 800000 is not as good as two teachers