当前位置:网站首页>5、離散控制與連續控制
5、離散控制與連續控制
2022-07-08 01:19:00 【C--G】
Discrete VS Continuous Control
Discrete
Continuous
DQN一個動作一個維度,不能用於連續控制
Policy Network一個動作一個維度,不能用於連續控制
非要用DQN做連續控制,就要將連續空間離散化

Better Approaches to Continuous Control
Deterministic policy network




updating Value Network by TD

Updating Policy Network by DPG



improvement:Using Target Networks





提昇方法

Stochastic Policy for Continuous Control



Policy Network
Univariate Normal Distribution
Multivariate Normal Distribution
Function Approximation


Training Policy Network

Auxiliary Network









Policy Gradient Methods




边栏推荐
- Transportation, new infrastructure and smart highway
- C# ?,?.,?? .....
- A little experience from reading "civilization, modernization, value investment and China"
- How to write mark down on vscode
- 6.Dropout应用
- Definition and classification of energy
- The Ministry of housing and urban rural development officially issued the technical standard for urban information model (CIM) basic platform, which will be implemented from June 1
- Image data preprocessing
- [deep learning] AI one click to change the sky
- 8. Optimizer
猜你喜欢

4.交叉熵

General configuration title

利用GPU训练网络模型

网络模型的保存与读取
Common fault analysis and Countermeasures of using MySQL in go language

On the concept and application of filtering in radar signal processing

Ag7120 and ag7220 explain the driving scheme of HDMI signal extension amplifier | ag7120 and ag7220 design HDMI signal extension amplifier circuit reference

13. Model saving and loading

3. MNIST dataset classification

Basic implementation of pie chart
随机推荐
Blue Bridge Cup embedded (F103) -1 STM32 clock operation and led operation method
Vscode reading Notepad Chinese display garbled code
C#中string用法
String usage in C #
Transportation, new infrastructure and smart highway
13.模型的保存和載入
Understanding of sidelobe cancellation
Vs code configuration latex environment nanny level configuration tutorial (dual system)
For the first time in China, three Tsinghua Yaoban undergraduates won the stoc best student thesis award
Content of one frame
12.RNN应用于手写数字识别
2022-07-07: the original array is a monotonic array with numbers greater than 0 and less than or equal to K. there may be equal numbers in it, and the overall trend is increasing. However, the number
50Mhz产生时间
How does starfish OS enable the value of SFO in the fourth phase of SFO destruction?
Mathematical modeling -- knowledge map
Design method and reference circuit of type C to hdmi+ PD + BB + usb3.1 hub (rj45/cf/tf/ sd/ multi port usb3.1 type-A) multifunctional expansion dock
7. Regularization application
14. Draw network model structure
Share a latex online editor | with latex common templates
Apt get error