当前位置:网站首页>Cross-entrpy Method
Cross-entrpy Method
2022-07-07 00:26:00 【Evergreen AAS】
CEM && RL
notes : The following content is quoted from the blog 《 Evolutionary strategy optimization algorithm CEM(Cross Entropy Method)》[1].
CEM It can also be used to solve Markov decision process , That is, reinforcement learning . We know , Reinforcement learning is also a dynamic planning process , Selecting an action in a certain state is like selecting a path at a node , The whole process is a path planning problem from the initial state to the final state , It's just that we hope to get a path that can maximize benefits . Under this consideration , You can use it CEM Modeling , We make a complete path a sample x=(s0,a0,s1,a1,…,sn,an), The total income obtained by the path is S(x)=∑Ni=0r(si,ai), The goal is to maximize this S(x), So how to sample these samples ? We can build a pp matrix : Matrix rows represent states , The list shows the action , Such as pij In state si perform aj The probability of action , We pass on this pp Multiple samples can be obtained by multiple sampling of the matrix , Then choose S(x) Higher samples are used to update pp matrix , Continuous iteration , Finally find the best p^ matrix .
This is similar to strategy iteration (policy iteration) The reinforcement learning method of : adopt p The matrix finds the probability of each action in each step state to form a decision strategy , But the parameter update does not use gradients . From another angle , You can also think of this as a value iteration (value iteration) The reinforcement learning method of , here p Matrix is classic Q-learning Medium Q matrix , It's just Q In matrix i Xing di j The column element represents the state si Next move aj Expectations of future earnings , Based on Behrman equation (Bellman equation) To update Q value ; and p The matrix represents the probability value , Update through cross moisture .
[1] Evolutionary strategy optimization algorithm CEM(Cross Entropy Method)
边栏推荐
- VTK volume rendering program design of 3D scanned volume data
- Pinia module division
- openresty ngx_lua子请求
- Everyone is always talking about EQ, so what is EQ?
- 从外企离开,我才知道什么叫尊重跟合规…
- Imeta | Chen Chengjie / Xia Rui of South China Agricultural University released a simple method of constructing Circos map by tbtools
- Compilation of kickstart file
- PDF文档签名指南
- The difference between redirectto and navigateto in uniapp
- Three sentences to briefly introduce subnet mask
猜你喜欢
[2022 the finest in the whole network] how to test the interface test generally? Process and steps of interface test
Everyone is always talking about EQ, so what is EQ?
What can the interactive slide screen demonstration bring to the enterprise exhibition hall
37頁數字鄉村振興智慧農業整體規劃建設方案
Are you ready to automate continuous deployment in ci/cd?
Geo data mining (III) enrichment analysis of go and KEGG using David database
GPIO简介
基於GO語言實現的X.509證書
17、 MySQL - high availability + read / write separation + gtid + semi synchronous master-slave replication cluster
rancher集成ldap,实现统一账号登录
随机推荐
How can computers ensure data security in the quantum era? The United States announced four alternative encryption algorithms
Leecode brush questions record sword finger offer 43 The number of occurrences of 1 in integers 1 to n
What can the interactive slide screen demonstration bring to the enterprise exhibition hall
Use source code compilation to install postgresql13.3 database
Typescript incremental compilation
kubernetes部署ldap
Compilation of kickstart file
基于GO语言实现的X.509证书
DAY THREE
Liuyongxin report | microbiome data analysis and science communication (7:30 p.m.)
MySQL learning notes (mind map)
[2022 the finest in the whole network] how to test the interface test generally? Process and steps of interface test
互动滑轨屏演示能为企业展厅带来什么
okcc呼叫中心的订单管理时怎么样的
DAY ONE
How rider uses nuget package offline
Use package FY in Oracle_ Recover_ Data. PCK to recover the table of truncate misoperation
DAY FIVE
PXE server configuration
Imeta | Chen Chengjie / Xia Rui of South China Agricultural University released a simple method of constructing Circos map by tbtools