当前位置:网站首页>2、TD+Learning
2、TD+Learning
2022-07-07 23:21:00 【C--G】
Discounted Return
Sarsa
TD算法,用来学习动作价值函数QΠ
Sarsa:Tabular Version
Sarsa’s Name
表格状态的Sarsa适用于状态和动作较少,随着状态和动作的增大,表格增大就很难学习
Sarsa:Neural Network Version
Q-Learning
TD算法,学习最优动作算法
Sarsa与Q-Learning
Derive TD Target
Q-Learning(tabular version)
Q-Learning(DQN Version)
Multi-Setp TD Target
- Using One Reward
- Using Multiple Rewards
价值回放(Revisiting DQN and TD Learning)
- Shortcoming 1:Waste of Experience
- Shortcoming2:Correlated Updates
- 经验回放
- History
Prioritized Experience Replay
左边是马里奥常见场景,右边是boos关场景,相对于左边而言,右边更少见,因此要加大右边场景的权重,TD error越大,那么该场景就越重要
随机梯度下降的学习率应该根据抽样的重要性进行调整
一条样本的TD越大,那么抽样权重就越大,学习率就越小
高估问题
Bootstrapping:自举问题,拽自己的鞋子将自己提起来
类似左脚踩右脚上天方法,现实中是不存在,强化学习中存在
Problem of Overestimation
- Reason 1:Maximization
- Reason 2:Bootstrapping
- Why does overestimation happen
- Why overestimation is a shortcoming
- Solutions
Target Network
TD Learning with Target Network
Update Target Network
Comparisons
Target Network虽然好了一点,但仍然无法摆脱高估问题
Double DQN
Naive Update
Using Target Network
Double DQN
Why does Double DQN work better
Dueling Network
Advantage Function(优势函数)
Value Functions
Optimal Value Functions
Properties of Advantage Function
Dueling Network
Revisiting DQN
Approximating Advantage Function
Approximating State-Value Function
Dueling Network:Formulation
蓝色加上红色再减去红色的最大值就得到紫色最后Dueling Network输出
Problem of Non-identifiability
边栏推荐
- Parade ps8625 | replace ps8625 | EDP to LVDS screen adapter or screen drive board
- Chapter 7 Bayesian classifier
- Connect to the previous chapter of the circuit to improve the material draft
- [necessary for R & D personnel] how to make your own dataset and display it.
- Using GPU to train network model
- New library online | information data of Chinese journalists
- 1.线性回归
- Several frequently used OCR document scanning tools | no watermark | avoid IQ tax
- USB type-C docking design | design USB type-C docking scheme | USB type-C docking circuit reference
- Multi purpose signal modulation generation system based on environmental optical signal detection and user-defined signal rules
猜你喜欢
The combination of relay and led small night light realizes the control of small night light cycle on and off
The communication clock (electronic time-frequency or electronic time-frequency auxiliary device) writes something casually
Markdown learning (entry level)
9.卷积神经网络介绍
A network composed of three convolution layers completes the image classification task of cifar10 data set
AI遮天传 ML-回归分析入门
Ag7120 and ag7220 explain the driving scheme of HDMI signal extension amplifier | ag7120 and ag7220 design HDMI signal extension amplifier circuit reference
2. Nonlinear regression
8. Optimizer
y59.第三章 Kubernetes从入门到精通 -- 持续集成与部署(三二)
随机推荐
【深度学习】AI一键换天
Multi purpose signal modulation generation system based on environmental optical signal detection and user-defined signal rules
HDMI to VGA acquisition HD adapter scheme | HDMI to VGA 1080p audio and video converter scheme | cs5210 scheme design explanation
Ag9310meq ag9310mfq angle two USB type C to HDMI audio and video data conversion function chips parameter difference and design circuit reference
[go record] start go language from scratch -- make an oscilloscope with go language (I) go language foundation
Vscode reading Notepad Chinese display garbled code
Design method and application of ag9311maq and ag9311mcq in USB type-C docking station or converter
Taiwan Xinchuang sss1700 latest Chinese specification | sss1700 latest Chinese specification | sss1700datasheet Chinese explanation
4、策略学习
3.MNIST数据集分类
Mathematical modeling -- knowledge map
Definition and classification of energy
For the first time in China, three Tsinghua Yaoban undergraduates won the stoc best student thesis award
Marubeni official website applet configuration tutorial is coming (with detailed steps)
[note] common combined filter circuit
Su embedded training - C language programming practice (implementation of address book)
Complete model verification (test, demo) routine
图像数据预处理
130. Surrounding area
A speed Limited large file transmission tool for every major network disk