当前位置：网站首页>4、策略学习

4、策略学习

2022-07-07 23:21:00 【C--G】

Policy Gradient with Baseline

Policy Gradient
在这里插入图片描述

BaseLine
Monte Carlo Approximation
Choices of Baselines
Choice 1: b=0

**Choice 2:b is state-value **
b = VΠ（St）

在这里插入图片描述

Policy Network
Value Network
Parameter Sharing

Reinforce with Baseline

Updating the policy network

Advantage Actor-Critic（A2C）

在这里插入图片描述

Reinforce versus A2C

两者网络结构几乎一致，价值网络不同 在这里插入图片描述

A2C with Multi-Step TD Target

one tep
在这里插入图片描述

Multi step

Reinforce with Baseline

在这里插入图片描述

versus

在这里插入图片描述

版权声明
本文为[C--G]所创，转载请带上原文链接，感谢
https://blog.csdn.net/weixin_50973728/article/details/125663884

边栏推荐

猜你喜欢

随机推荐