当前位置:网站首页>2、TD+Learning
2、TD+Learning
2022-07-08 01:26:00 【C--G】
Discounted Return
Sarsa
TD Algorithm , Used to learn the action value function QΠ
Sarsa:Tabular Version
Sarsa’s Name
Table status Sarsa Applicable to less States and actions , As the state and action increase , It is difficult to learn when the table is enlarged
Sarsa:Neural Network Version
Q-Learning
TD Algorithm , Learn the optimal action Algorithm
Sarsa And Q-Learning
Derive TD Target
Q-Learning(tabular version)
Q-Learning(DQN Version)
Multi-Setp TD Target
- Using One Reward
- Using Multiple Rewards
Value playback (Revisiting DQN and TD Learning)
- Shortcoming 1:Waste of Experience
- Shortcoming2:Correlated Updates
- Experience playback
- History
Prioritized Experience Replay
On the left is a common scene of Mario , On the right is boos Off scene , Relative to the left , The right side is more rare , Therefore, we should increase the weight of the scene on the right ,TD error The bigger it is , Then the more important the scene is
The learning rate of random gradient descent should be adjusted according to the importance of sampling
Of a sample TD The bigger it is , Then the greater the sampling weight , The lower the learning rate
Overestimation problem
Bootstrapping: Bootstrap problem , Pull your shoes and lift yourself up
Similar to the method of stepping on the right foot with the left foot , It doesn't exist in reality , There exist in reinforcement learning
Problem of Overestimation
- Reason 1:Maximization
- Reason 2:Bootstrapping
- Why does overestimation happen
- Why overestimation is a shortcoming
- Solutions
Target Network
TD Learning with Target Network
Update Target Network
Comparisons
Target Network Although a little better , But we still cannot get rid of the problem of overestimation
Double DQN
Naive Update
Using Target Network
Double DQN
Why does Double DQN work better
Dueling Network
Advantage Function( Dominance function )
Value Functions
Optimal Value Functions
Properties of Advantage Function
Dueling Network
Revisiting DQN
Approximating Advantage Function
Approximating State-Value Function
Dueling Network:Formulation
Blue plus red and then subtract the maximum value of red to get purple finally Dueling Network Output
Problem of Non-identifiability
边栏推荐
- Use "recombined netlist" to automatically activate eco "APR netlist"
- Two methods for full screen adaptation of background pictures, background size: cover; Or (background size: 100% 100%;)
- 2021-03-06 - play with the application of reflection in the framework
- 2. Nonlinear regression
- 1. Linear regression
- General configuration toolbox
- Design method and reference circuit of type C to hdmi+ PD + BB + usb3.1 hub (rj45/cf/tf/ sd/ multi port usb3.1 type-A) multifunctional expansion dock
- Call (import) in Jupiter notebook ipynb . Py file
- 2022 examination for safety production management personnel of hazardous chemical production units and new version of examination questions for safety production management personnel of hazardous chem
- 2021-03-14 - play with generics
猜你喜欢
Micro rabbit gets a field of API interface JSON
A speed Limited large file transmission tool for every major network disk
Chapter 16 intensive learning
5、离散控制与连续控制
2022 R1 fast opening pressure vessel operation test question bank and R1 fast opening pressure vessel operation free test questions
Gnuradio 3.9 using OOT custom module problem record
How to use education discounts to open Apple Music members for 5 yuan / month and realize member sharing
2022 chemical automation control instrument examination summary and chemical automation control instrument simulation examination questions
2022 examination for safety production management personnel of hazardous chemical production units and new version of examination questions for safety production management personnel of hazardous chem
Chapter 7 Bayesian classifier
随机推荐
1. Linear regression
Smart agricultural technology framework
Chapter XI feature selection
Gnuradio 3.9 using OOT custom module problem record
2022 R1 fast opening pressure vessel operation test question bank and R1 fast opening pressure vessel operation free test questions
Cs5261type-c to HDMI alternative ag9310 | ag9310 alternative
On the concept and application of filtering in radar signal processing
Leetcode notes No.21
The beauty of Mathematics -- the principle of fine Fourier transform
Solve the error: NPM warn config global ` --global`, `--local` are deprecated Use `--location=global` instead.
Cs5212an design display to VGA HD adapter products | display to VGA Hd 1080p adapter products
4. Strategic Learning
Several frequently used OCR document scanning tools | no watermark | avoid IQ tax
Overall introduction of the project
Vscode is added to the right-click function menu
Use "recombined netlist" to automatically activate eco "APR netlist"
[deep learning] AI one click to change the sky
General configuration title
Share a latex online editor | with latex common templates
General configuration tooltip