当前位置：网站首页>[five minute paper] reinforcement learning based on parameterized action space

[five minute paper] reinforcement learning based on parameterized action space

2022-07-26 15:21:00 【Little Mr He】

List of articles

Thesis title ：Reinforcement Learning with Parameterized Actions

The problem solved ？

background

Parameterized action space means that a discrete action has a vectorized parameter . At every decision step , An agent needs to decide which action to perform , And which parameter does this action take to execute .

The method adopted ？

Put forward Q-PAMDP Algorithm , Learn discrete actions and continuous actions alternately . So for in state $s$ The probability of selecting a parameterized action can be expressed as $\pi(a,x|s)$ . The choice of discrete actions can be expressed as $\pi^{d}(a|s)$ , The choice of action parameters can be expressed as $\pi^{a}(x|s)$ , The probability of the whole strategy can be expressed as ：

$\pi(a,x|s) = \pi^{d}(a|s)\pi^{a}(x|s)$

Select the strategy parameters of discrete actions with $w$ Express , Then for $\pi_{w}^{d}(a|s)$ , Parameterized action strategies are represented by a set of parameters $\theta$ , Defined as $\pi_{\theta}^{a}(x|s)$ , This parameterized set can be expressed as $\theta = [\theta_{a_{1}}, \cdots, \theta_{a_{k}}]$ .

Want to optimize parameters , The first way is to optimize directly $\theta$ and $w$ Two parameters ：

$J(\theta, \omega)=\mathbb{E}_{s_{0} \sim D}\left[V^{\pi_{\Theta}}\left(s_{0}\right)\right]$

The second way is to update the two alternately , Fix $\theta$ Can optimize $w$ Parameters ：

$W(\theta)=\arg \max _{\omega} J(\theta, \omega)=\omega_{\theta}^{*}$

Later fixed $w$ Optimize $\theta$ Parameters ：

$\begin{aligned} J_{\omega}(\theta) &=J(\theta, \omega) \\ H(\theta) &=J(\theta, W(\theta)) \end{aligned}$

The pseudo code of the algorithm is ：

The author also provides a theoretical analysis to prove , If you need it later, make it up .

The result is ？

Published information ？ The author information ？

Reference link

Related papers

原网站

版权声明
本文为[Little Mr He]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/207/202207261503038441.html