当前位置:网站首页>Deep reinforcement learning for intelligent transportation systems: a survey paper reading notes
Deep reinforcement learning for intelligent transportation systems: a survey paper reading notes
2022-07-03 02:43:00 【strawberry47】
This is an overview of the field of intelligent transportation , Focus on explaining how to use reinforcement learning to solve traffic signal lamp control RL+TSC ;Traffic Signal Control : Traffic signal lamp control ,
Catalog
One . Overview
- classification :
AI based transportation applications:
① management applications,
② public transportation,
③ autonomous vehicles
This part also introduces a lot RL Basic concepts of , Target network 、 Experience playback and so on , Are common knowledge points in the field of reinforcement learning , You can read my other notes ~
Traffic signal lamp control :
state: Team length 、 Vehicle location 、 Vehicle speed
The goal is : Minimize congestion at intersections
Two . Traffic signal control is represented as Deep RL problem
2.1 state:
- RGB Images , combination DQN; Snapshot of intersection ( Speed and position )
- Image-like representation/discrete traffic state encoding (DTSE); advantage : Contains a wealth of information , Speed 、 Location 、 Signal lamp 、 The acceleration
- feature-based value vector, Vector representation ; Such as : Team length 、 Accumulated waiting time 、 The average waiting time of a lane 、 Signal duration 、 The number of vehicles in a lane
- Consider more complete road information
2.2 action:
It is usually a crossroads , Different directions and durations need to be considered ;
Four green light stages : North-South Green (NSG) North south direction , East-West Green (EWG) East West traffic , North-South Advance Left Green (NSLG) Turn left in the north-south direction , East-West Advance Left Green (EWLG) Turn left in the east-west direction .
- Select green light ( Choose a green light in four directions )
- binary action: Keep the current or Change direction
- Update the duration of each phase
Q: Only care about the green light ?
A: Some papers are simplified into two green stages : North South Green and East West Green , Left turn ignored
2.3 reward:
- Waiting time
- Cumulative delay
- Team length
- absolute value of the traffic data ( Traffic data )
2.4 Neural Network Structure:
- MLP
- CNN: and DQN combination
- RNN: Sequence data
- AutoEncoder
2.5 Simulation environment :
- In the early :Java-based Green Light District (GLD)
- popular :Simulation Urban Mobility (SUMO)
- mature :VISSIM,AIMSUN( And MATLAB Good interaction )
3、 ... and . Deep RL Application of in traffic signal control
3.1 Standard RL Applications:
3.1.1 Single Agent Applications:
RL-based single intersection
It will be divided into single intersection and multi intersection traffic
reference [57] Take the length of the team as state, The total delay time is taken as reward; It's the first one binary action model; Compare with the scene of fixed time signal lamp
The literature [60] The real intersection scene is proposed for the first time , Three methods are proposed state Definition ... four reward function.
( This part is equivalent to related work)
3.1.2 Multi-Agent Applications
Cooperate to control multiple intersections
- Four standards TSC Algorithm ( It should be commonly used baseline): Fixed time control 、 stochastic control 、 The longest team is preferred 、 Vehicles have priority at most
- Classic algorithm (Wiering Put forward ):TC-1,TC-2,TC-3
- state Configured by traffic lights 、 Vehicle location 、 Composition of vehicle destination , Considering the local and global characteristics ( It's not practical , Because the vehicle information is unknown )
- The purpose is to reduce the waiting time
Follow up work is right Wiering Work improvement :
① Add congestion information at other intersections
② increase state size( By adding congestion information )
③ Increase the blockage coefficient (instead of increasing the state space)
④ Add congestion and unexpected information
⑤ Consider collaborative information
⑥ Multiple goals :vehicle stops, average waiting time, and maximum queue length are targeted as objectives for low, medium, and high traffic volume Design differently according to different scenes reward functionKhamis The job of :
① Bayesian transition probability ->reward function
② more specific objectives
③ seven objectives, Combined with the cooperative exploration functionrelated work:
① Hierarchical reinforcement learning
② R-Markov Average Reward
③ Consider the collaborative information between regions
3.2 Deep RL Applications:
3.2.1 Single Agent Applications:
3.2.2 Multi-Agent Deep RL:
Four . DEEP RL FOR OTHER ITS APPLICATIONS
边栏推荐
- 怎么将yolov5中的PANet层改为BiFPN
- GBase 8c系统表-pg_collation
- Pytest (6) -fixture (Firmware)
- Why choose a frame? What frame to choose
- Getting started | jetpack hilt dependency injection framework
- The Linux server needs to install the agent software EPS (agent) database
- As a leader, how to control the code version and demand development when the epidemic comes| Community essay solicitation
- C语言中左值和右值的区别
- SqlServer行转列PIVOT
- [fluent] futurebuilder asynchronous programming (futurebuilder construction method | asyncsnapshot asynchronous calculation)
猜你喜欢
A2L file parsing based on CAN bus (2)
错误Invalid bound statement (not found): com.ruoyi.stock.mapper.StockDetailMapper.xxxx解决
Kubernetes cluster log and efk architecture log scheme
HW-初始准备
MATLAB小技巧(24)RBF,GRNN,PNN-神经网络
Basic operation of binary tree (C language version)
ASP. Net core 6 framework unveiling example demonstration [02]: application development based on routing, MVC and grpc
[shutter] setup of shutter development environment (supplement the latest information | the latest installation tutorial on August 25, 2021)
[fluent] listview list (map method description of list set | vertical list | horizontal list | code example)
The data in servlet is transferred to JSP page, and the problem cannot be displayed using El expression ${}
随机推荐
Source code analysis | resource loading resources
Build a private cloud disk cloudrev
[Hcia]No.15 Vlan间通信
[fluent] listview list (map method description of list set | vertical list | horizontal list | code example)
How to change the panet layer in yolov5 to bifpn
Serious security vulnerabilities reported by moxa mxview network management software
Two dimensional format array format index subscript continuity problem leads to return JSON format problem
QT qcombobox add qccheckbox (drop-down list box insert check box, including source code + comments)
Kubernetes cluster log and efk architecture log scheme
The use of Flink CDC mongodb and the implementation of Flink SQL parsing complex nested JSON data in monggo
random shuffle注意
Error invalid bound statement (not found): com ruoyi. stock. mapper. StockDetailMapper. XXXX solution
Gbase 8C system table PG_ cast
[shutter] banner carousel component (shutter_wiper plug-in | swiper component)
leetcode540
sql server 查询指定表的表结构
Gbase 8C system table PG_ collation
Kubernetes cluster log and efk architecture log scheme
where 1=1 是什么意思
Gbase 8C create user / role example 2