当前位置:网站首页>Sarsa notes
Sarsa notes
2022-06-30 12:34:00 【Show brother invincible】
The first thing to say is ,sarsa It's also based on TD The algorithm of , He and Q-LEARNING The only difference is that target_policy, Or update Q The way of value , Before Q-learning Is the next... After selecting the action state, At present Q The largest value in the table .
and Sarsa He and Q-Learning The difference is that it is also based on epilson-greedy Select the next action to be executed , And put this action Q Value into the formula to update Q surface .
It's worth noting that , This action was not performed , He is also Q The estimated value in the table , It's just that the selected actions are different , When I watched the Mo fan video, it said sarsa It's action , I thought it was sarsa It's carried out action, Is based on MC Methods , It turned out that I was wrong , This may also be a mistake of my own .
Look at it like this sarsa The algorithm flow should not be too simple :
边栏推荐
- "Xiaodeng" user personal data management in operation and maintenance
- How to detect 3D line spectral confocal sensors in semiconductors
- Subtrate 源码追新导读-5月上旬: XCM 正式启用
- grep匹配查找
- 90. (cesium chapter) cesium high level listening events
- SuperMap iclient3d for webgl loading TMS tiles
- 1020. number of enclaves
- 【一天学awk】正则匹配
- 实现多方数据安全共享,解决普惠金融信息不对称难题
- 浏览器播放rtsp视频,基于nodeJs
猜你喜欢

海思3559开发常识储备:相关名词全解

3D线光谱共焦传感器在半导体如何检测
![Remove invalid parentheses [simulate stack with array]](/img/df/0a2ae5ae40adb833d52b2dddea291b.png)
Remove invalid parentheses [simulate stack with array]

Talk about how to do hardware compatibility testing and quickly migrate to openeuler?

Idea has a new artifact, a set of code to adapt to multiple terminals!

A high precision positioning approach for category support components with multiscale difference reading notes

SuperMap iDesktop 常见倾斜数据处理全流程解析

90. (cesium chapter) cesium high level listening events

ES6新特性介绍

Some commonly used hardware information of the server (constantly updated)
随机推荐
品达通用权限系统(Day 7~Day 8)
[cf] 803 div2 B. Rising Sand
Swagger2自动生成APi文档
Analysis of the whole process of common tilt data processing in SuperMap idesktop
实现多方数据安全共享,解决普惠金融信息不对称难题
Solve the problem that the server cannot be connected via SSH during reinstallation
[cf] 803 div2 A. XOR Mixup
How to select an OLAP database engine?
Serial communication interface 8250
海思3559 sample解析:venc
iServer发布ES服务查询设置最大返回数量
695. maximum island area
海思3559开发常识储备:相关名词全解
Subtrate 源码追新导读-5月上旬: XCM 正式启用
A review of quantum neural networks 2022 for generating learning tasks
Idea has a new artifact, a set of code to adapt to multiple terminals!
1020. number of enclaves
Grep match lookup
8253 counter introduction
[QNX Hypervisor 2.2用户手册]6.2.3 Guest与外部之间通信