当前位置:网站首页>Cross-entrpy Method
Cross-entrpy Method
2022-07-07 00:26:00 【Evergreen AAS】
notes : The following content is quoted from the blog 《 Evolutionary strategy optimization algorithm CEM(Cross Entropy Method)》[1].
CEM It can also be used to solve Markov decision process , That is, reinforcement learning . We know , Reinforcement learning is also a dynamic planning process , Selecting an action in a certain state is like selecting a path at a node , The whole process is a path planning problem from the initial state to the final state , It's just that we hope to get a path that can maximize benefits . Under this consideration , You can use it CEM Modeling , We make a complete path a sample x=(s0,a0,s1,a1,…,sn,an), The total income obtained by the path is S(x)=∑Ni=0r(si,ai), The goal is to maximize this S(x), So how to sample these samples ? We can build a pp matrix : Matrix rows represent states , The list shows the action , Such as pij In state si perform aj The probability of action , We pass on this pp Multiple samples can be obtained by multiple sampling of the matrix , Then choose S(x) Higher samples are used to update pp matrix , Continuous iteration , Finally find the best p^ matrix .
This is similar to strategy iteration (policy iteration) The reinforcement learning method of : adopt p The matrix finds the probability of each action in each step state to form a decision strategy , But the parameter update does not use gradients . From another angle , You can also think of this as a value iteration (value iteration) The reinforcement learning method of , here p Matrix is classic Q-learning Medium Q matrix , It's just Q In matrix i Xing di j The column element represents the state si Next move aj Expectations of future earnings , Based on Behrman equation (Bellman equation) To update Q value ; and p The matrix represents the probability value , Update through cross moisture .
[1] Evolutionary strategy optimization algorithm CEM(Cross Entropy Method)
- Leecode brush question record sword finger offer 58 - ii Rotate string left
- Introduction au GPIO
- pinia 模块划分
- Clipboard management tool paste Chinese version
- 48页数字政府智慧政务一网通办解决方案
- uniapp实现从本地上传头像并显示,同时将头像转化为base64格式存储在mysql数据库中
- PostgreSQL使用Pgpool-II实现读写分离+负载均衡
- Leecode brush questions record sword finger offer 43 The number of occurrences of 1 in integers 1 to n
- How can computers ensure data security in the quantum era? The United States announced four alternative encryption algorithms
2022/2/10 summary
The difference between redirectto and navigateto in uniapp
System activity monitor ISTAT menus 6.61 (1185) Chinese repair
@TableId can‘t more than one in Class: “com.example.CloseContactSearcher.entity.Activity“.
VTK volume rendering program design of 3D scanned volume data
Core knowledge of distributed cache
AI超清修复出黄家驹眼里的光、LeCun大佬《深度学习》课程生还报告、绝美画作只需一行代码、AI最新论文 | ShowMeAI资讯日报 #07.06
JWT signature does not match locally computed signature. JWT validity cannot be asserted and should
Amazon MemoryDB for Redis 和 Amazon ElastiCache for Redis 的内存优化
[automated testing framework] what you need to know about unittest
Designed for decision tree, the National University of Singapore and Tsinghua University jointly proposed a fast and safe federal learning system
Three application characteristics of immersive projection in offline display
Sword finger offer 26 Substructure of tree
MySQL learning notes (mind map)
Pytest multi process / multi thread execution test case