当前位置:网站首页>Double dqn notes
Double dqn notes
2022-06-30 12:34:00 【Show brother invincible】
double-dqn yes dqn The more famous variety , One of the main problems he solved was before DQN In the formula , We use the network to estimate Q Value ratio Q The actual value of the value is much higher , That is, the famous over estimation problem .
First, let's talk about what is overestimation , Those who have read Zhang Sijun's great God will know
https://zhuanlan.zhihu.com/p/109498587
Q The most fidelity value , That is, his definition , It should be the expectation of all possible states following the direct step triggered by this action , But we can't say that we can't solve this problem until all the states have been iterated Q value .
So in Q-Learning We chose the next stage state Maximum Q Value to update Q surface , This is why estimates have been made .
In other words, this overestimation is actually Q-Learning Bring it ,double-dqn What you do is that it uses target Come and ask for Q value , But the action of this choice cannot be Q-target to , He is given by the network that is learning in real time Q The most valuable action , use Q-Target give Q value , The advantage of this is that after iterative learning ,Q-target and Q- It is estimated that the selected action is not the same action , Then the problem of overestimation is reduced in a certain probability .
边栏推荐
- 视频按每100帧存一个文件夹,处理完再图片转视频
- Beego development blog system learning (II)
- 实现多方数据安全共享,解决普惠金融信息不对称难题
- Pharmacy management system
- How difficult is data governance and data innovation?
- 21. Notes on WPF binding
- Getting started with the go language is simple: go handles XML files
- AGCO AI frontier promotion (6.30)
- 移除无效的括号[用数组模拟栈]
- Pinda general permission system (day 7~day 8)
猜你喜欢

Analysis of the whole process of common tilt data processing in SuperMap idesktop

Use of polarplot function in MATLAB

Browser plays RTSP video based on nodejs

Redis-緩存問題

60 个神级 VS Code 插件!!

海思3559万能平台搭建:获取数据帧修改后编码

edusoho企培版纯内网部署教程(解决播放器,上传,后台卡顿问题)

Commands for redis basic operations

Redis6学习笔记-第二章-Redis6的基本操作

Vision based robot grasping: from object localization, object pose estimation to parallel gripper grasping estimation
随机推荐
移除无效的括号[用数组模拟栈]
实现多方数据安全共享,解决普惠金融信息不对称难题
Remove invalid parentheses [simulate stack with array]
Map集合
When building the second website with pagoda, the website always reports an error: no input file specified
Achieve secure data sharing among multiple parties and solve the problem of asymmetric information in Inclusive Finance
Redis installation on Linux system
Browser plays RTSP video based on nodejs
通过EF Core框架根据SQL Server数据库表生成实体类
【一天学awk】基础中的基础
图解使用Navicat for MySQL创建存储过程
Statistics on the number of closed Islands
Introduction to new features of ES6
Shutter start from zero 006 radio switches and checkboxes
腾讯二面:@Bean 与 @Component 用在同一个类上,会怎么样?
A new journey of the smart court, paperless office, escorting the green trial of the smart court
60 divine vs Code plug-ins!!
90. (cesium chapter) cesium high level listening events
Redis cache problem
Talk about how to do hardware compatibility testing and quickly migrate to openeuler?