当前位置:网站首页>5、离散控制与连续控制
5、离散控制与连续控制
2022-07-07 23:21:00 【C--G】
Discrete VS Continuous Control
Discrete
Continuous
DQN一个动作一个维度,不能用于连续控制
Policy Network一个动作一个维度,不能用于连续控制
非要用DQN做连续控制,就要将连续空间离散化
Better Approaches to Continuous Control
Deterministic policy network
updating Value Network by TD
Updating Policy Network by DPG
improvement:Using Target Networks
提升方法
Stochastic Policy for Continuous Control
Policy Network
Univariate Normal Distribution
Multivariate Normal Distribution
Function Approximation
Training Policy Network
Auxiliary Network
Policy Gradient Methods
边栏推荐
- EDP to LVDS conversion design circuit | EDP to LVDS adapter board circuit | capstone/cs5211 chip circuit schematic reference
- New library online | cnopendata China Star Hotel data
- Design method and application of ag9311maq and ag9311mcq in USB type-C docking station or converter
- Su embedded training - Day8
- Jemter distributed
- 9. Introduction to convolutional neural network
- Vscode is added to the right-click function menu
- Fofa attack and defense challenge record
- 基础篇——整合第三方技术
- 12.RNN应用于手写数字识别
猜你喜欢
Design method and reference circuit of type C to hdmi+ PD + BB + usb3.1 hub (rj45/cf/tf/ sd/ multi port usb3.1 type-A) multifunctional expansion dock
Led serial communication
Chapter 16 intensive learning
跨模态语义关联对齐检索-图像文本匹配(Image-Text Matching)
8. Optimizer
AI zhetianchuan ml novice decision tree
Generic configuration legend
Two methods for full screen adaptation of background pictures, background size: cover; Or (background size: 100% 100%;)
2022-07-07: the original array is a monotonic array with numbers greater than 0 and less than or equal to K. there may be equal numbers in it, and the overall trend is increasing. However, the number
10.CNN应用于手写数字识别
随机推荐
swift获取url参数
Recommend a document management tool Zotero | with tutorials and learning paths
10.CNN应用于手写数字识别
How to write mark down on vscode
Binder core API
13. Model saving and loading
9.卷积神经网络介绍
New library launched | cnopendata China Time-honored enterprise directory
130. Surrounding area
From starfish OS' continued deflationary consumption of SFO, the value of SFO in the long run
[deep learning] AI one click to change the sky
Ag9310 same function alternative | cs5261 replaces ag9310type-c to HDMI single switch screen alternative | low BOM replaces ag9310 design
3. MNIST dataset classification
4.交叉熵
USB type-C mobile phone projection scheme | USB type-C docking station scheme | TV / projector type-C converter scheme | ag9300ag9310ag9320
How does starfish OS enable the value of SFO in the fourth phase of SFO destruction?
2.非线性回归
Content of one frame
130. 被圍繞的區域
Markdown learning (entry level)