当前位置:网站首页>Exercises in Chapter II of intensive learning
Exercises in Chapter II of intensive learning
2022-07-23 06:41:00 【Infinite power】
Markov properties (Markov Property): The next state of a state is only related to the current state , It has nothing to do with the past
Markov chain (Markov Chain): Stochastic processes with Markov properties and existing in discrete exponential sets and state spaces
State transition matrix : Each line describes the probability of reaching all other nodes from one node
Markov reward process (Markov Reward) A reward function is added to the Markov chain
horizon: Defines the same episode Or the length of the whole track , Determined by a finite number of steps
return : Discount the reward , Then get the corresponding income
Ballman Equation:( Behrman's equation ):
Monte Carlo Algorithm( The monte carlo method ): Calculate the value of the value function
Iterative Algorithm( Dynamic programming method ): By iterating over the corresponding
Bellman Equation, Finally, make it converge , When the last updated state does not change much from the previous state : Update stoppedQ function :(active-value function)
Behrman equation in matrix form is difficult to solve
Calculate the Behrman equation :
Monte Carlo : When you get MRP, Let it put the boat in , Let him drift with the tide , Then a trajectory is generated , After generating a track , Get a reward , And then put it's discouneted The reward is calculated directly , Calculate it and accumulate , When accumulated to a certain number of tracks , Divide directly by this trajectory , Get its value
Dynamic programming :
Combination of the twoMarkov reward process (MRP) And Markov decision process (MDP) difference :
MDP: More Decision, That's one more action, There is also one more state transition condition
There is a transformation relationship between the two :MDP+policy=MRPLooking for the best policy Method :
Get the best value function – For this Q Function maximization – Get the best function
– Directly in this Q Take one from the function and let this Action Maximum value – Take out its best directly policy
Method :
Exhaustive method ( Generally do not use )
policy iteration: Optimize policy- Take out the value function – calculate Q function - Maximize
value iteration: Keep iterating Bellman Optimality Equation
边栏推荐
- PWN stack overflow basic exercise - 1
- Globally and Locally Consistent Image Completion 论文笔记
- 安装不了schedule
- 图文并茂演示小程序movable-view的可移动范围
- Comprehensive experiment of ENSP on OSPF
- [machine learning] principle and practice of model selection (performance measurement)
- 如何为您的企业设置内部Wiki?
- pwn1_ sctf_ two thousand and sixteen
- 阿里云盘 iOS /安卓版 3.8.0 更新,可根据清晰度缓存视频了
- 我用Flutter Deskstop做了一个Mars Xlog日志解析工具
猜你喜欢

微信小程序开发:第一个helloWorld

内存泄漏和溢出
![[machine learning] principle and practice of model selection (performance measurement)](/img/1c/ef18230452c613aad67fcafc04e87d.png)
[machine learning] principle and practice of model selection (performance measurement)

pwn ——ret2libc3

Why is the mobile phone signal poor when Im instant messaging is developed

Thread类中run和start的区别

Memory leaks and overflows

PWN stack overflow basic exercise - 1

How to configure a cute little shark theme for typera?

使用mediapipe和OpenCV 实现简单人脸检测
随机推荐
MGRE与OSPF综合实验
Maximum continuous subsequence -- daily question
[foundation 2] - container
Feign远程调用丢失请求头问题解决
Beautification of lasso regression results
【机器学习】模型选择(性能度量)原理及实战
Idea debug is stuck during startup. Solution
怎么为typora配置一个可爱的小鲨鱼主题?
Introduction to distributed learning and federated learning
Conditions affecting interface query speed
Centos7 installing and uninstalling mysql5.7
Li Xiang, director of ZTE cloud infrastructure open source and standards: open source risks and open source governance for enterprises
Memory leaks and overflows
Ni Guangnan, academician of the Chinese Academy of Engineering: embrace open source and world collaborative innovation
yapi和Apifox 哪个好用?深度分析 yapi 和Apifox 的功能特性
Drawing lollipop chart with R language
阿里云盘 iOS /安卓版 3.8.0 更新,可根据清晰度缓存视频了
Golang AES encryption and decryption
R语言箱线图添加 t.test 显著性-
R语言绘制 空间可视化