当前位置:网站首页>5、离散控制与连续控制
5、离散控制与连续控制
2022-07-07 23:21:00 【C--G】
Discrete VS Continuous Control
Discrete
Continuous
DQN一个动作一个维度,不能用于连续控制
Policy Network一个动作一个维度,不能用于连续控制
非要用DQN做连续控制,就要将连续空间离散化

Better Approaches to Continuous Control
Deterministic policy network




updating Value Network by TD

Updating Policy Network by DPG



improvement:Using Target Networks





提升方法

Stochastic Policy for Continuous Control



Policy Network
Univariate Normal Distribution
Multivariate Normal Distribution
Function Approximation


Training Policy Network

Auxiliary Network









Policy Gradient Methods




边栏推荐
- [deep learning] AI one click to change the sky
- Recommend a document management tool mendely Reference Manager
- Marubeni official website applet configuration tutorial is coming (with detailed steps)
- Kuntai ch7511b scheme design | ch7511b design EDP to LVDS data | pin to pin replaces ch7511b circuit design
- Design method and reference circuit of type C to hdmi+ PD + BB + usb3.1 hub (rj45/cf/tf/ sd/ multi port usb3.1 type-A) multifunctional expansion dock
- Su embedded training - Day8
- 7.正则化应用
- [necessary for R & D personnel] how to make your own dataset and display it.
- Su embedded training - Day6
- Fofa attack and defense challenge record
猜你喜欢

Design method and application of ag9311maq and ag9311mcq in USB type-C docking station or converter

1.线性回归

2022-07-07: the original array is a monotonic array with numbers greater than 0 and less than or equal to K. there may be equal numbers in it, and the overall trend is increasing. However, the number

Led serial communication

4.交叉熵

Su embedded training - Day9

Ag9311maq design 100W USB type C docking station data | ag9311maq is used for 100W USB type C to HDMI with PD fast charging +u3+sd/cf docking station scheme description

Cs5261type-c to HDMI alternative ag9310 | ag9310 alternative

Redis, do you understand the list

New library online | cnopendata China Star Hotel data
随机推荐
STL--String类的常用功能复写
6. Dropout application
How to transfer Netease cloud music /qq music to Apple Music
C#中string用法
10. CNN applied to handwritten digit recognition
50MHz generation time
General configuration title
串口接收一包数据
Su embedded training - Day6
1.线性回归
Vs code configuration latex environment nanny level configuration tutorial (dual system)
Chapter VIII integrated learning
大二级分类产品页权重低,不收录怎么办?
11. Recurrent neural network RNN
Complete model training routine
Several frequently used OCR document scanning tools | no watermark | avoid IQ tax
Is it safe to speculate in stocks on mobile phones?
[deep learning] AI one click to change the sky
Basic realization of line chart (II)
Chapter 5 neural network