当前位置:网站首页>Cross-entrpy Method

Cross-entrpy Method

2022-07-07 00:26:00 Evergreen AAS

CEM && RL

notes : The following content is quoted from the blog 《 Evolutionary strategy optimization algorithm CEM(Cross Entropy Method)》[1].

CEM It can also be used to solve Markov decision process , That is, reinforcement learning . We know , Reinforcement learning is also a dynamic planning process , Selecting an action in a certain state is like selecting a path at a node , The whole process is a path planning problem from the initial state to the final state , It's just that we hope to get a path that can maximize benefits . Under this consideration , You can use it CEM Modeling , We make a complete path a sample x=(s0,a0,s1,a1,…,sn,an), The total income obtained by the path is S(x)=∑Ni=0r(si,ai), The goal is to maximize this S(x), So how to sample these samples ? We can build a pp matrix : Matrix rows represent states , The list shows the action , Such as pij In state si perform aj The probability of action , We pass on this pp Multiple samples can be obtained by multiple sampling of the matrix , Then choose S(x) Higher samples are used to update pp matrix , Continuous iteration , Finally find the best p^ matrix .

This is similar to strategy iteration (policy iteration) The reinforcement learning method of : adopt p The matrix finds the probability of each action in each step state to form a decision strategy , But the parameter update does not use gradients . From another angle , You can also think of this as a value iteration (value iteration) The reinforcement learning method of , here p Matrix is classic Q-learning Medium Q matrix , It's just Q In matrix i Xing di j The column element represents the state si Next move aj Expectations of future earnings , Based on Behrman equation (Bellman equation) To update Q value ; and p The matrix represents the probability value , Update through cross moisture .

[1] Evolutionary strategy optimization algorithm CEM(Cross Entropy Method)

原网站

版权声明
本文为[Evergreen AAS]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202130959332838.html