当前位置:网站首页>DQN笔记
DQN笔记
2022-06-30 10:05:00 【显哥无敌】

DQN基于Q-Learning,也就是说DQN也是一个离线算法,它第一个关注点是解决空间状态爆炸的问题,也就是说它不解决连续动作的问题
tip:何谓连续动作
例子,取值为(0,1)之间任意数为连续变量,动作包含连续变量的叫做连续动作
最基础的思想是用一个神经网络来拟合Q-Table里面的Q值。
用到了神经网络那么就有一个训练的问题,训练的数据哪里来,与环境交互。最基础的DQN数据存储(s,a,r,s’)数据用于训练
DQN里面的网络是一个监督学习的过程,其目标是真实值和估计值之间的差值最小,术语叫做TD-ERROR,用公式来表示的化:
这个代表的是真实值,是的,真实值也是估计出来的,基础的DQN用target_network来选取action,同时计算这个真实值里面的Q值,然后减去需要学习网络估计出来的Q值。得到td-error
而被更新的那个原来的那个网络是根据经验被更新的网络,通过上面那个标签值和网络估计值最小利用梯度下降法来求解更新网络参数。
target_network是隔C步才更新的网络。它的存在就是为了存储一下那个被更新网络的状态。不能一边更新,一边标签值也在变吧。它本身不学习,每隔C步,学习经验的网络会把参数赋给它
还需要说的一点就是经验回放机制,也就是为什么要把经验(s,a,r,s’)存起来再随机选取来更新网络呢。
因为你跑一次,前一步和后一步是有强关联性的,所以经验回放就是为了破除这些关联性,就是学习的两条记录是独立的两条记录
边栏推荐
- About Library (function library), dynamic library and static library
- Remember the experience of an internship. It is necessary to go to the pit (I)
- 技能梳理[email protected]体感机械臂
- Implementation of monitor program with assembly language
- The AOV function of R language was used for repeated measures ANOVA (one intra group factor and one inter group factor) and interaction Plot function and boxplot to visualize the interaction
- 59 websites programmers need to know
- R语言aov函数进行重复测量方差分析(Repeated measures ANOVA、其中一个组内因素和一个组间因素)、分别使用interaction.plot函数和boxplot对交互作用进行可视化
- GD32 RT-Thread PWM驱动函数
- 技能梳理[email protected]在oled上控制一条狗的奔跑
- Input a decimal data, convert to 8, using the sequence stack
猜你喜欢

WGet -- 404 not found due to spaces in URL

ArcGIS Pro + PS 矢量化用地规划图

最新SCI影响因子公布:国产期刊最高破46分!网友:算是把IF玩明白了
[email protected] somatosensory manipulator"/>Skill combing [email protected] somatosensory manipulator
[email protected] control a dog's running on OLED"/>Skill combing [email protected] control a dog's running on OLED

同事的接口文档我每次看着就头大,毛病多多。。。

The performance of arm's new CPU has been improved by 22%, up to 12 cores can be combined, and the GPU is first equipped with hardware optical tracking. Netizen: the gap with apple is growing

Machine learning interview preparation (I) KNN

Compétences Comb 27 @ Body sense Manipulator
[email protected]+adxl345+ Motor vibration + serial port output"/>Skill sorting [email protected]+adxl345+ Motor vibration + serial port output
随机推荐
文件共享服务器
Arm新CPU性能提升22%,最高可组合12核,GPU首配硬件光追,网友:跟苹果的差距越来越大了...
Auto SEG loss: automatic loss function design
Overview of currency
Ant s19xp appeared in 140t, why is it called the computing power ceiling by the world
Harvester ch1 of CKB and HNS, connection tutorial analysis
Leetcode question brushing (IV) -- greedy thought (go Implementation)
Curl --- the request fails when the post request parameter is too long (more than 1024b)
Test memory read rate
Compare the maximum computing power of the Cenozoic top ant s19xp and the existing s19pro in bitland
WGet -- 404 not found due to spaces in URL
同事的接口文档我每次看着就头大,毛病多多。。。
mysql数据库基础:视图、变量
透过华为军团看科技之变(五):智慧园区
CVPR 2022 | Tsinghua & bytek & JD put forward BRT: Bridging Transformer for vision and point cloud 3D target detection
在 sCrypt 中实现高效的椭圆曲线点加法和乘法
ArcGIS PRO + PS vectorized land use planning map
matplotlib 笔记: contourf & contour
Viewing technological changes through Huawei Corps (V): smart Park
Koreano essential creates a professional style