当前位置:网站首页>5、离散控制与连续控制
5、离散控制与连续控制
2022-07-07 23:21:00 【C--G】
Discrete VS Continuous Control
Discrete
Continuous
DQN一个动作一个维度,不能用于连续控制
Policy Network一个动作一个维度,不能用于连续控制
非要用DQN做连续控制,就要将连续空间离散化

Better Approaches to Continuous Control
Deterministic policy network




updating Value Network by TD

Updating Policy Network by DPG



improvement:Using Target Networks





提升方法

Stochastic Policy for Continuous Control



Policy Network
Univariate Normal Distribution
Multivariate Normal Distribution
Function Approximation


Training Policy Network

Auxiliary Network









Policy Gradient Methods




边栏推荐
- [necessary for R & D personnel] how to make your own dataset and display it.
- Invalid V-for traversal element style
- 12. RNN is applied to handwritten digit recognition
- Multi purpose signal modulation generation system based on environmental optical signal detection and user-defined signal rules
- 130. Zones environnantes
- USB type-C mobile phone projection scheme | USB type-C docking station scheme | TV / projector type-C converter scheme | ag9300ag9310ag9320
- C# ?,?.,?? .....
- 7. Regularization application
- Jemter distributed
- Ag9310 design USB type C to hdmi+u2+5v slow charging scheme design | ag9310 expansion dock scheme circuit | type-C dongle design data
猜你喜欢

6. Dropout application

Semantic segmentation model base segmentation_ models_ Detailed introduction to pytorch

7. Regularization application

Kuntai ch7511b scheme design | ch7511b design EDP to LVDS data | pin to pin replaces ch7511b circuit design

130. 被围绕的区域

On the concept and application of filtering in radar signal processing

130. Surrounding area

Parade ps8625 | replace ps8625 | EDP to LVDS screen adapter or screen drive board

Design method and reference circuit of type C to hdmi+ PD + BB + usb3.1 hub (rj45/cf/tf/ sd/ multi port usb3.1 type-A) multifunctional expansion dock

Get started quickly using the local testing tool postman
随机推荐
Multi purpose signal modulation generation system based on environmental optical signal detection and user-defined signal rules
Fofa attack and defense challenge record
Basic implementation of pie chart
3. MNIST dataset classification
完整的模型验证(测试,demo)套路
General configuration title
13. Enregistrement et chargement des modèles
Content of one frame
A network composed of three convolution layers completes the image classification task of cifar10 data set
Su embedded training - Day9
网络模型的保存与读取
C#中string用法
130. Surrounding area
STL -- common function replication of string class
Su embedded training - Day5
130. Zones environnantes
133. Clone map
Led serial communication
Su embedded training - Day6
5. Over fitting, dropout, regularization