当前位置:网站首页>Sarsa笔记
Sarsa笔记
2022-06-30 10:05:00 【显哥无敌】
首先要讲的是,sarsa也是一种基于TD的算法,他与Q-LEARNING唯一的不同是那个target_policy,或者说更新Q值的方式,之前Q-learning是选取执行动作后下一个state,当前Q表里值最大的值。
而Sarsa他与Q-Learning不同的一点就是它也是根据epilson-greedy的方式取选取要执行的下一步动作,并把这个动作的Q值代入公式里去更新Q表。
值得说明的是,这个动作并没有被执行,他也是Q表里估计的值,只不过选出来的动作不同罢了,我看莫凡视频的时候它讲的是sarsa是行动派,我当时以为是sarsa执行了这个action,是基于MC的方法,后来发现我错了,这可能也是我自己的一个误区吧。
这样一看sarsa算法流程不要太简单:
边栏推荐
- 我在鹅厂淘到了一波“炼丹神器”,开发者快打包
- Es common curl finishing
- 技能梳理[email protected]+阿里云+nbiot+dht11+bh1750+土壤湿度传感器+oled
- 技能梳理[email protected]+adxl345+电机震动+串口输出
- 我的远程办公深度体验 | 社区征文
- 【深度学习】深度学习检测小目标常用方法
- Configure Yii: display MySQL extension module verification failed
- 技能梳理[email protected]基于51系列单片机的智能仪器教具
- ArcGIS Pro scripting tool (6) -- repairing CAD layer data sources
- Skill sorting [email protected]+ Alibaba cloud +nbiot+dht11+bh1750+ soil moisture sensor +oled
猜你喜欢

Implementation of monitor program with assembly language

【深度学习】深度学习检测小目标常用方法

苹果5G芯片被曝研发失败,QQ密码bug引热议,蔚来回应做空传闻,今日更多大新闻在此...

腾讯云数据库工程师能力认证重磅推出,各界共话人才培养难题
[email protected]+阿里云+nbiot+dht11+bh1750+土壤湿度传感器+oled"/>技能梳理[email protected]+阿里云+nbiot+dht11+bh1750+土壤湿度传感器+oled

Harvester ch1 of CKB and HNS, connection tutorial analysis

Google 辟谣放弃 TensorFlow,它还活着!

How to deploy deflationary combustion destruction contract code in BSC chain_ Deploy dividend and marketing wallet contract code

记一次实习的经历,趟坑必备(一)

I found a wave of "alchemy artifact" in the goose factory. The developer should pack it quickly
随机推荐
Configure Yii: display MySQL extension module verification failed
TypeScript–es5中的类,继承,静态方法
nvm、nrm、npx使用(安装、基本命令、参数、curl、wget)
R language plot visualization: use plot to visualize the prediction confidence of the multi classification model, the prediction confidence of each data point of the model in the 2D grid, and the conf
Yixian e-commerce released its first quarterly report: adhere to R & D and brand investment to achieve sustainable and high-quality development
逸仙電商發布一季報:堅持研發及品牌投入,實現可持續高質量發展
05_Node js 文件管理模块 fs
Viewing technological changes through Huawei Corps (V): smart Park
Turn to cartoon learning notes
超长干货 | Kubernetes命名空间详解
The AOV function of R language was used for repeated measures ANOVA (one intra group factor and one inter group factor) and interaction Plot function and boxplot to visualize the interaction
苹果高管公然“开怼”:三星抄袭 iPhone,只加了个大屏
Remember the experience of an internship. It is necessary to go to the pit (I)
技能梳理[email protected]+adxl345+电机震动+串口输出
GD32 RT-Thread flash驱动函数
Highlight display of Jinbei LB box, adhering to mini special effects
Es common curl finishing
MySQL log management, backup and recovery of databases (1)
Apple's 5g chip was revealed to have failed in research and development, and the QQ password bug caused heated discussion. Wei Lai responded to the short selling rumors. Today, more big news is here
那个程序员,被打了。