当前位置:网站首页>Sarsa笔记
Sarsa笔记
2022-06-30 10:05:00 【显哥无敌】
首先要讲的是,sarsa也是一种基于TD的算法,他与Q-LEARNING唯一的不同是那个target_policy,或者说更新Q值的方式,之前Q-learning是选取执行动作后下一个state,当前Q表里值最大的值。
而Sarsa他与Q-Learning不同的一点就是它也是根据epilson-greedy的方式取选取要执行的下一步动作,并把这个动作的Q值代入公式里去更新Q表。
值得说明的是,这个动作并没有被执行,他也是Q表里估计的值,只不过选出来的动作不同罢了,我看莫凡视频的时候它讲的是sarsa是行动派,我当时以为是sarsa执行了这个action,是基于MC的方法,后来发现我错了,这可能也是我自己的一个误区吧。
这样一看sarsa算法流程不要太简单:
边栏推荐
- 苹果高管公然“开怼”:三星抄袭 iPhone,只加了个大屏
- The performance of arm's new CPU has been improved by 22%, up to 12 cores can be combined, and the GPU is first equipped with hardware optical tracking. Netizen: the gap with apple is growing
- 今晚19:00知识赋能第2期直播丨OpenHarmony智能家居项目之控制面板界面设计
- Gd32 RT thread DAC driver function
- GD32 RT-Thread DAC驱动函数
- Yixian e-commerce released its first quarterly report: adhere to R & D and brand investment to achieve sustainable and high-quality development
- Curl --- the request fails when the post request parameter is too long (more than 1024b)
- 潘多拉 IOT 开发板学习(HAL 库)—— 实验1 跑马灯(RGB)实验(学习笔记)
- Compare the maximum computing power of the Cenozoic top ant s19xp and the existing s19pro in bitland
- June training (day 30) - topology sorting
猜你喜欢
Get through the supply chain Shenzhen gift show helps cross-border e-commerce find ways to break the situation
历史上的今天:微软收购 PowerPoint 开发商;SGI 和 MIPS 合并
移植完整版RT-Thread到GD32F4XX(详细)
[email protected] voice module +stm32+nfc"/>
Skill combing [email protected] voice module +stm32+nfc
ArcGIS Pro脚本工具(6)——修复CAD图层数据源
MySQL advanced SQL statement of database (2)
最新SCI影响因子公布:国产期刊最高破46分!网友:算是把IF玩明白了
WGet -- 404 not found due to spaces in URL
[email protected]在oled上控制一条狗的奔跑"/>
技能梳理[email protected]在oled上控制一条狗的奔跑
Oracle creates a stored procedure successfully, but the compilation fails
随机推荐
R语言plotly可视化:使用plotly可视化多分类模型的预测置信度、模型在2D网格中每个数据点预测的置信度、置信度定义为在某一点上最高分与其他类别得分之和之间的差值
Machine learning interview preparation (I) KNN
[deep learning] common methods for deep learning to detect small targets
My in-depth remote office experience | community essay solicitation
[rust daily] the first rust monthly magazine on January 22, 2021 invites everyone to participate
CSDN daily one practice 2021.11.06 question 1 (C language)
Compétences Comb 27 @ Body sense Manipulator
The performance of arm's new CPU has been improved by 22%, up to 12 cores can be combined, and the GPU is first equipped with hardware optical tracking. Netizen: the gap with apple is growing
Gd32 RT thread PWM drive function
Skill combing [email protected] voice module +stm32+nfc
马斯克推特粉丝过亿了,但他在线失联已一周
GD32 RT-Thread flash驱动函数
MySQL log management, backup and recovery of databases (1)
微信推出图片大爆炸功能;苹果自研 5G 芯片或已失败;微软解决导致 Edge 停止响应的 bug|极客头条...
透过华为军团看科技之变(五):智慧园区
Yixian e - commerce publie un rapport trimestriel: adhérer à la R & D et à l’investissement de la marque, réaliser un développement durable et de haute qualité
GD32 RT-Thread RTC驱动函数
ArcGIS Pro + PS 矢量化用地规划图
R language plot visualization: use plot to visualize the prediction confidence of the multi classification model, the prediction confidence of each data point of the model in the 2D grid, and the conf
运动App如何实现端侧后台保活,让运动记录更完整?