当前位置:网站首页>4. Strategic Learning

4. Strategic Learning

2022-07-08 01:14:00 C--G

Policy Gradient with Baseline

Policy Gradient
 Insert picture description here

  • BaseLine
     Insert picture description here
     Insert picture description here
  • Monte Carlo Approximation
     Insert picture description here
     Insert picture description here
     Insert picture description here
  • Choices of Baselines
    Choice 1: b=0
     Insert picture description here
    **Choice 2:b is state-value ** Insert picture description here
  • b = VΠ(St)

 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here

  • Policy Network
     Insert picture description here
  • Value Network
     Insert picture description here
  • Parameter Sharing
     Insert picture description here

Reinforce with Baseline

  • Updating the policy network
     Insert picture description here
     Insert picture description here
     Insert picture description here

Advantage Actor-Critic(A2C)

 Insert picture description here
 Insert picture description here
 Insert picture description here

Reinforce versus A2C

The network structure of the two is almost the same , Different value networks  Insert picture description here

A2C with Multi-Step TD Target

one tep
 Insert picture description here
 Insert picture description here
Multi step
 Insert picture description here

Reinforce with Baseline

 Insert picture description here

versus

 Insert picture description here
 Insert picture description here

原网站

版权声明
本文为[C--G]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/189/202207072320355586.html