当前位置:网站首页>电子科技大学|强化学习中有效利用的聚类经验回放
电子科技大学|强化学习中有效利用的聚类经验回放
2022-07-03 20:31:00 【智源社区】
【标题】Clustering Experience Replay for the Effective Exploitation in Reinforcement Learning
【作者团队】Min Li, Tianyi Huang, William Zhu
【发表日期】2022.6.27
【论文链接】https://www.sciencedirect.com/science/article/pii/S0031320322003569
【推荐理由】强化学习通过利用不同决策产生的转换经验来训练智能体做出决策。为了利用这种经验,大多数强化学习方法通过统一采样重放探索过的转换。但是通过这种方式,很容易忽略最后探索的转换。利用这种经验的另一种方法是通过训练中的估计误差来定义每个转换的优先级,然后根据它们的优先级重放转换。但它只更新在当前训练时间步长重播的转换的优先级,因此优先级较低的转换将被忽略。本文提出了一种聚类体验回放,称为 CER,有效地利用隐藏在当前培训中所有探索过的过渡中的经验。CER 通过基于时间划分的分治框架对转换进行聚类和重放。首先,它将整个训练过程分为几个阶段。其次,在每个阶段结束时,它使用k-means对该阶段探索的过渡进行聚类。最后,它构造了一个条件概率密度函数,以确保在当前训练中能够充分地重播各种转换。
边栏推荐
- Oak-d raspberry pie cloud project [with detailed code]
- Reinforcement learning - learning notes 1 | basic concepts
- AcWing 1460. Where am i?
- 浅议.NET遗留应用改造
- Offset related concepts + drag modal box case
- Cap and base theory
- Line segment tree blue book explanation + classic example acwing 1275 Maximum number
- Camera calibration (I): robot hand eye calibration
- Derivation of decision tree theory
- In 2021, the global foam protection packaging revenue was about $5286.7 million, and it is expected to reach $6615 million in 2028
猜你喜欢
LabVIEW training
In 2021, the global revenue of thick film resistors was about $1537.3 million, and it is expected to reach $2118.7 million in 2028
Haven't expressed the artifact yet? Valentine's Day is coming. Please send her a special gift~
JVM JNI and PVM pybind11 mass data transmission and optimization
设计电商秒杀系统
The global industrial design revenue in 2021 was about $44360 million, and it is expected to reach $62720 million in 2028. From 2022 to 2028, the CAGR was 5.5%
How can the outside world get values when using nodejs to link MySQL
一台服务器最大并发 tcp 连接数多少?65535?
Commands related to files and directories
Interval product of zhinai sauce (prefix product + inverse element)
随机推荐
Shortest path problem of graph theory (acwing template)
IP address is such an important knowledge that it's useless to listen to a younger student?
Parental delegation mechanism
Cap and base theory
Test changes in Devops mode -- learning and thinking
Plan for the first half of 2022 -- pass the PMP Exam
9 pyqt5 qscrollarea scroll area and qscrollbar scroll bar
Instructions for common methods of regular expressions
Micro service knowledge sorting - cache technology
[postgresql]postgresql custom function returns an instance of table type
11-grom-v2-04-advanced query
Test access criteria
Global and Chinese market of high purity copper foil 2022-2028: Research Report on technology, participants, trends, market size and share
jvm jni 及 pvm pybind11 大批量数据传输及优化
Global and Chinese market of rubidium standard 2022-2028: Research Report on technology, participants, trends, market size and share
2022 safety officer-c certificate examination and safety officer-c certificate registration examination
Use of CMD command
Global and Chinese markets of lithium chloride 2022-2028: Research Report on technology, participants, trends, market size and share
AI enhanced safety monitoring project [with detailed code]
Global and Chinese markets of polyimide tubes for electronics 2022-2028: Research Report on technology, participants, trends, market size and share