当前位置：网站首页>4. Strategic Learning

4. Strategic Learning

2022-07-08 01:14:00 【C--G】

Policy Gradient with Baseline

Policy Gradient
Insert picture description here

BaseLine
Monte Carlo Approximation
Choices of Baselines
Choice 1: b=0

**Choice 2:b is state-value **
b = VΠ（St）

Insert picture description here

Policy Network
Value Network
Parameter Sharing

Reinforce with Baseline

Updating the policy network

Advantage Actor-Critic（A2C）

Insert picture description here

Reinforce versus A2C

The network structure of the two is almost the same , Different value networks Insert picture description here

A2C with Multi-Step TD Target

one tep
Insert picture description here

Multi step

Reinforce with Baseline

Insert picture description here

versus

Insert picture description here

版权声明
本文为[C--G]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/189/202207072320355586.html

边栏推荐

猜你喜欢

随机推荐