当前位置：网站首页>4、策略學習

4、策略學習

2022-07-08 01:14:00 【C--G】

Policy Gradient with Baseline

Policy Gradient
在這裏插入圖片描述

BaseLine
Monte Carlo Approximation
Choices of Baselines
Choice 1: b=0

**Choice 2:b is state-value **
b = VΠ（St）

在這裏插入圖片描述

Policy Network
Value Network
Parameter Sharing

Reinforce with Baseline

Updating the policy network

Advantage Actor-Critic（A2C）

在這裏插入圖片描述

Reinforce versus A2C

兩者網絡結構幾乎一致，價值網絡不同 在這裏插入圖片描述

A2C with Multi-Step TD Target

one tep
在這裏插入圖片描述

Multi step

Reinforce with Baseline

在這裏插入圖片描述

versus

在這裏插入圖片描述

版权声明
本文为[C--G]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/189/202207072320355586.html

边栏推荐

猜你喜欢

随机推荐