当前位置:网站首页>Sarsa notes
Sarsa notes
2022-06-30 12:34:00 【Show brother invincible】
The first thing to say is ,sarsa It's also based on TD The algorithm of , He and Q-LEARNING The only difference is that target_policy, Or update Q The way of value , Before Q-learning Is the next... After selecting the action state, At present Q The largest value in the table .
and Sarsa He and Q-Learning The difference is that it is also based on epilson-greedy Select the next action to be executed , And put this action Q Value into the formula to update Q surface .
It's worth noting that , This action was not performed , He is also Q The estimated value in the table , It's just that the selected actions are different , When I watched the Mo fan video, it said sarsa It's action , I thought it was sarsa It's carried out action, Is based on MC Methods , It turned out that I was wrong , This may also be a mistake of my own .
Look at it like this sarsa The algorithm flow should not be too simple :
边栏推荐
- Redis的基本操作的命令
- Edusoho enterprise training version intranet only deployment tutorial (to solve the problems of player, upload and background jam)
- MySQL中变量的定义和变量的赋值使用
- Getting started with the go language is simple: go handles XML files
- How difficult is data governance and data innovation?
- 【一天学awk】内置变量的使用
- Lvgl widget styles
- Commands for redis basic operations
- Building of Hisilicon 3559 universal platform: obtaining the modified code of data frame
- 解决numpy.core._exceptions.UFuncTypeError: ufunc ‘add‘ did not contain a loop with signature matchin问题
猜你喜欢

Why should offline stores do new retail?

浏览器播放rtsp视频,基于nodeJs
![[leetcode] 15. Sum of three numbers](/img/0c/4363d7737d90c170eb4519828990b9.png)
[leetcode] 15. Sum of three numbers

MySQL composite query

Redis-缓存问题

60 个神级 VS Code 插件!!

NoSQL - redis configuration and optimization

立创 EDA #学习笔记10# | 常用连接器元器件识别 和 无源蜂鸣器驱动电路

Analysis of the whole process of common tilt data processing in SuperMap idesktop

edusoho企培版纯内网部署教程(解决播放器,上传,后台卡顿问题)
随机推荐
Why should offline stores do new retail?
90. (cesium chapter) cesium high level listening events
【BUG解决】fiftyone报AttributeError: module ‘cv2‘ has no attribute ‘gapi_wip_gst_GStreamerPipeline‘错误解决方法
Global capital market 101: Breit, one of the best investment targets for domestic high net worth people
Shutter start from zero 006 radio switches and checkboxes
Redis configuration files and new data types
Achieve secure data sharing among multiple parties and solve the problem of asymmetric information in Inclusive Finance
各厂家rtsp地址格式如下:
SuperMap 3D SDKs_Unity插件开发——连接数据服务进行SQL查询
Hisilicon 3559 universal platform construction: introduction to YUV format
Docker安装Mysql8和sqlyong连接报错2058的解决方法[随笔记录]
“\“id\“ contains an invalid value“
Browser plays RTSP video based on nodejs
1175. prime number arrangement: application of multiplication principle
How difficult is data governance and data innovation?
Getting started with the go language is simple: go handles XML files
【一天学awk】内置变量的使用
90. (cesium chapter) cesium high level listening events
Iserver publishing es service query setting maximum return quantity
【目标跟踪】|pytracking 配置 win 编译prroi_pool.pyd