当前位置：网站首页>[reinforcement learning notes] common symbols in reinforcement learning

[reinforcement learning notes] common symbols in reinforcement learning

2022-06-25 08:35:00 【Allenpandas】

 Symbol  Symbol interpretation 
 ≐ \doteq ≐ Defining symbols 
 ≈ \approx ≈ About equal to 
 ϵ \epsilon ϵ ϵ \epsilon ϵ The probability of random action in a greedy strategy 
 γ \gamma γ Discount factor 
 λ \lambda λ Decay rate in trace 
 ← \leftarrow ← Assignment symbol 
 s s s,  s ′ s' s′ state  s s s
 a a a action  a a a
 r r r earnings  r r r
 t t t Discrete time steps , Or time 
 π \pi π Strategy （ Decision making rules ）
 π ( s ) \pi(s) π(s) according to  Deterministic strategies  π \pi π In state  s s s Action selected when 
 π ( a ∣ s ) \pi(a|s) π(a∣s) according to  Random strategy  π \pi π In state  s s s Action selected when  a a a Probability 
 A t A_{t} At​ t t t The action of the moment 
 S t S_{t} St​ t t t The state of the moment , Usually by  S t − 1 S_{t-1} St−1​ and  A t − 1 A_{t-1} At−1​ Random decision 
 R t R_{t} Rt​ t t t The benefits of the moment , Usually by  S t − 1 S_{t-1} St−1​ and  A t − 1 A_{t-1} At−1​ Random decision 
 G t G_t Gt​ t t t The reward of the moment （ It's an expectation ）
 p ( s ′ , r ∣ s , a ) p(s', r |s, a) p(s′,r∣s,a) From the State  s s s Take action  a a a Move to state  s ′ s' s′ And get the benefits  r r r Probability 
 p ( s ′ ∣ s , a ) p(s' |s, a) p(s′∣s,a) From the State  s s s Take action  a a a Move to state  s ′ s' s′ Probability 
 r ( s , a ) r(s, a) r(s,a) From the State  s s s Take action  a a a The expectation of immediate benefits 
 r ( s , a , s ′ ) r(s, a, s') r(s,a,s′) From the State  s s s Take action  a a a Move to state  s ′ s' s′ The expectation of immediate benefits 
 v π ( s ) v_\pi(s) vπ​(s) state  s s s In the strategy  π \pi π Under the value of （ Expected return ）
 v ∗ ( s ) v_*(s) v∗​(s) state  s s s The value under the optimal strategy 
 q π ( s , a ) q_\pi(s, a) qπ​(s,a) state  s s s In the strategy  π \pi π Take action  a a a The value of 
 q ∗ ( s , a ) q_*(s, a) q∗​(s,a) state  s s s Take action under the optimal strategy  a a a The value of 
 V V V,  V t V_{t} Vt​ State value function 
 Q Q Q,  Q t Q_{t} Qt​ Action value function 

Symbol	Symbol interpretation
$\doteq$	Defining symbols
$\approx$	About equal to
$\epsilon$	$\epsilon$ The probability of random action in a greedy strategy
$\gamma$	Discount factor
$\lambda$	Decay rate in trace
$\leftarrow$	Assignment symbol
$s$ , $s^{'}$	state $s$
$a$	action $a$
$r$	earnings $r$
$t$	Discrete time steps , Or time
$\pi$	Strategy （ Decision making rules ）
$\pi(s)$	according to Deterministic strategies $\pi$ In state $s$ Action selected when
$\pi(a\|s)$	according to Random strategy $\pi$ In state $s$ Action selected when $a$ Probability
$A_{t}$	$t$ The action of the moment
$S_{t}$	$t$ The state of the moment , Usually by $S_{t-1}$ and $A_{t-1}$ Random decision
$R_{t}$	$t$ The benefits of the moment , Usually by $S_{t-1}$ and $A_{t-1}$ Random decision
$G_t$	$t$ The reward of the moment （ It's an expectation ）
$p (s^{'}, r ∣ s, a)$	From the State $s$ Take action $a$ Move to state $s^{'}$ And get the benefits $r$ Probability
$p (s^{'} ∣ s, a)$	From the State $s$ Take action $a$ Move to state $s^{'}$ Probability
$r (s, a)$	From the State $s$ Take action $a$ The expectation of immediate benefits
$r (s, a, s^{'})$	From the State $s$ Take action $a$ Move to state $s^{'}$ The expectation of immediate benefits
$v_\pi(s)$	state $s$ In the strategy $\pi$ Under the value of （ Expected return ）
$v_*(s)$	state $s$ The value under the optimal strategy
$q_\pi(s, a)$	state $s$ In the strategy $\pi$ Take action $a$ The value of
$q_*(s, a)$	state $s$ Take action under the optimal strategy $a$ The value of
$V$ , $V_{t}$	State value function
$Q$ , $Q_{t}$	Action value function

原网站

版权声明
本文为[Allenpandas]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/176/202206250713199080.html

当前位置：网站首页>[reinforcement learning notes] common symbols in reinforcement learning

[reinforcement learning notes] common symbols in reinforcement learning

边栏推荐

猜你喜欢

随机推荐