当前位置:网站首页>Double-DQN笔记
Double-DQN笔记
2022-06-30 10:05:00 【显哥无敌】
double-dqn是dqn比较著名的变种,他主要解决的一个问题是在之前的DQN公式中,我们用网络估计出来的Q值比Q值的实际值要高很多,也就是著名的过估计问题。
先说一下什么是过估计,看过张斯俊大神这一篇的就会知道
https://zhuanlan.zhihu.com/p/109498587
Q值最最最保真的值,也就是他的定义式,应该是由这个动作引发的直接一步后续所有可能状态的期望,但是我们不可能说等到所有状态都迭代好了以后再去求这个Q值。
所以在Q-Learning阶段我们就选用了下一个state最大Q值的那个动作来更新Q表,这也就是为啥会产生过估计。
也就是说这个过估计其实是Q-Learning带给它的,double-dqn做的是一个事情是它用target来求Q值,但这个选择的动作不由Q-target给,他由那个在实时学习的网络来给出Q值最大的动作,用Q-Target给出Q值,这样做的好处是经过迭代学习以后,Q-target和Q-估计选出的动作不是同一个动作,那么就一定概率上降低了过估计的问题。
边栏推荐
- 内存逃逸分析
- 无心剑中译狄金森《灵魂择其伴侣》
- 潘多拉 IOT 开发板学习(HAL 库)—— 实验1 跑马灯(RGB)实验(学习笔记)
- Es common curl finishing
- I found a wave of "alchemy artifact" in the goose factory. The developer should pack it quickly
- 技能梳理[email protected]+adxl345+电机震动+串口输出
- 敏捷开发: 超级易用水桶估计系统
- 【Rust日报】2021-01-23 几个新库发布
- 19:00 p.m. tonight, knowledge empowerment phase 2 live broadcast - control panel interface design of openharmony smart home project
- Leetcode question brushing (I) -- double pointer (go Implementation)
猜你喜欢

CSDN blog operation team 2022 H1 summary

Apple's 5g chip was revealed to have failed in research and development, and the QQ password bug caused heated discussion. Wei Lai responded to the short selling rumors. Today, more big news is here
[email protected]體感機械臂"/>技能梳理[email protected]體感機械臂

mysql数据库基础:视图、变量

The latest SCI impact factor release: the highest score of domestic journals is 46! Netizen: I understand if

WGet -- 404 not found due to spaces in URL

吴恩达2022机器学习专项课测评来了!

Yixian e-commerce released its first quarterly report: adhere to R & D and brand investment to achieve sustainable and high-quality development

Compétences Comb 27 @ Body sense Manipulator

Anhui "requirements for design depth of Hefei fabricated building construction drawing review" was printed and distributed; Hebei Hengshui city adjusts the pre-sale license standard for prefabricated
随机推荐
Go -- maximum heap and minimum heap
WGet -- 404 not found due to spaces in URL
苹果高管公然“开怼”:三星抄袭 iPhone,只加了个大屏
技能梳理[email protected]+adxl345+电机震动+串口输出
JS FAQs
Criu enables hot migration
Skill combing [email protected] intelligent instrument teaching aids based on 51 series single chip microcomputer
The human agent of kDa, Jinbei kd6, takes you to explore the metauniverse
前嗅ForeSpider教程:抽取数据
Ant s19xp appeared in 140t, why is it called the computing power ceiling by the world
超长干货 | Kubernetes命名空间详解
Implementation of monitor program with assembly language
GD32 RT-Thread OTA/Bootloader驱动函数
滴滴开源敏捷测试用例管理平台!
19:00 p.m. tonight, knowledge empowerment phase 2 live broadcast - control panel interface design of openharmony smart home project
How to deploy deflationary combustion destruction contract code in BSC chain_ Deploy dividend and marketing wallet contract code
The AOV function of R language was used for repeated measures ANOVA (one intra group factor and one inter group factor) and interaction Plot function and boxplot to visualize the interaction
MySQL从入门到精通50讲(三十二)-ScyllaDB生产环境集群搭建
技能梳理[email protected]语音模块+stm32+nfc
Es common curl finishing