当前位置:网站首页>Cross-entrpy Method
Cross-entrpy Method
2022-07-07 00:26:00 【Evergreen AAS】
CEM && RL
notes : The following content is quoted from the blog 《 Evolutionary strategy optimization algorithm CEM(Cross Entropy Method)》[1].
CEM It can also be used to solve Markov decision process , That is, reinforcement learning . We know , Reinforcement learning is also a dynamic planning process , Selecting an action in a certain state is like selecting a path at a node , The whole process is a path planning problem from the initial state to the final state , It's just that we hope to get a path that can maximize benefits . Under this consideration , You can use it CEM Modeling , We make a complete path a sample x=(s0,a0,s1,a1,…,sn,an), The total income obtained by the path is S(x)=∑Ni=0r(si,ai), The goal is to maximize this S(x), So how to sample these samples ? We can build a pp matrix : Matrix rows represent states , The list shows the action , Such as pij In state si perform aj The probability of action , We pass on this pp Multiple samples can be obtained by multiple sampling of the matrix , Then choose S(x) Higher samples are used to update pp matrix , Continuous iteration , Finally find the best p^ matrix .
This is similar to strategy iteration (policy iteration) The reinforcement learning method of : adopt p The matrix finds the probability of each action in each step state to form a decision strategy , But the parameter update does not use gradients . From another angle , You can also think of this as a value iteration (value iteration) The reinforcement learning method of , here p Matrix is classic Q-learning Medium Q matrix , It's just Q In matrix i Xing di j The column element represents the state si Next move aj Expectations of future earnings , Based on Behrman equation (Bellman equation) To update Q value ; and p The matrix represents the probability value , Update through cross moisture .
[1] Evolutionary strategy optimization algorithm CEM(Cross Entropy Method)
边栏推荐
- Leecode brush question record sword finger offer 58 - ii Rotate string left
- DAY FIVE
- Introduction au GPIO
- pinia 模块划分
- Clipboard management tool paste Chinese version
- 48页数字政府智慧政务一网通办解决方案
- uniapp实现从本地上传头像并显示,同时将头像转化为base64格式存储在mysql数据库中
- PostgreSQL使用Pgpool-II实现读写分离+负载均衡
- Leecode brush questions record sword finger offer 43 The number of occurrences of 1 in integers 1 to n
- How can computers ensure data security in the quantum era? The United States announced four alternative encryption algorithms
猜你喜欢
智能运维应用之道,告别企业数字化转型危机
2022/2/10 summary
The difference between redirectto and navigateto in uniapp
System activity monitor ISTAT menus 6.61 (1185) Chinese repair
@TableId can‘t more than one in Class: “com.example.CloseContactSearcher.entity.Activity“.
VTK volume rendering program design of 3D scanned volume data
DAY TWO
Core knowledge of distributed cache
AI超清修复出黄家驹眼里的光、LeCun大佬《深度学习》课程生还报告、绝美画作只需一行代码、AI最新论文 | ShowMeAI资讯日报 #07.06
2022年PMP项目管理考试敏捷知识点(9)
随机推荐
TypeScript中使用类型别名
JWT signature does not match locally computed signature. JWT validity cannot be asserted and should
Amazon MemoryDB for Redis 和 Amazon ElastiCache for Redis 的内存优化
[automated testing framework] what you need to know about unittest
数据运营平台-数据采集[通俗易懂]
GPIO簡介
【向量检索研究系列】产品介绍
沉浸式投影在线下展示中的三大应用特点
PostgreSQL高可用之repmgr(1主2从+1witness)+Pgpool-II实现主从切换+读写分离
准备好在CI/CD中自动化持续部署了吗?
如何判断一个数组中的元素包含一个对象的所有属性值
Designed for decision tree, the National University of Singapore and Tsinghua University jointly proposed a fast and safe federal learning system
Three application characteristics of immersive projection in offline display
Sword finger offer 26 Substructure of tree
okcc呼叫中心的订单管理时怎么样的
DAY TWO
陀螺仪的工作原理
MySQL learning notes (mind map)
kubernetes部署ldap
Pytest multi process / multi thread execution test case