当前位置:网站首页>University of Electronic Science and technology | playback of clustering experience effectively used in reinforcement learning
University of Electronic Science and technology | playback of clustering experience effectively used in reinforcement learning
2022-07-03 20:42:00 【Zhiyuan community】
【 title 】Clustering Experience Replay for the Effective Exploitation in Reinforcement Learning
【 The author team 】Min Li, Tianyi Huang, William Zhu
【 Date of publication 】2022.6.27
【 Thesis link 】https://www.sciencedirect.com/science/article/pii/S0031320322003569
【 Recommended reasons 】 Reinforcement learning trains agents to make decisions by using the transformation experience generated by different decisions . In order to take advantage of this experience , Most reinforcement learning methods pass Replay the explored conversion through unified sampling . But in this way , It's easy to ignore the transformation of the final exploration . Another way to use this experience is to define the priority of each transformation through the estimation error in training , Then replay the conversion according to their priority . But it only updates the priority of the conversion replayed at the current training time step , Therefore, the conversion with lower priority will be ignored . This paper proposes a clustering experience playback , be called CER, Effectively use the experience hidden in all the transitions explored in the current training .CER The transformation is clustered and replayed through the divide and conquer framework based on time division . First , It divides the whole training process into several stages . secondly , At the end of each phase , It USES k-means Cluster the transitions explored at this stage . Last , It constructs a conditional probability density function , To ensure that various transitions can be fully replayed in the current training .
边栏推荐
- Qtablewidget control of QT
- Haven't expressed the artifact yet? Valentine's Day is coming. Please send her a special gift~
- 浅议.NET遗留应用改造
- 强基计划 数学相关书籍 推荐
- Etcd 基于Raft的一致性保证
- 同花顺开户注册安全靠谱吗?有没有风险的?
- XAI+网络安全?布兰登大学等最新《可解释人工智能在网络安全应用》综述,33页pdf阐述其现状、挑战、开放问题和未来方向
- First knowledge of database
- Basic knowledge of dictionaries and collections
- MySQL dump - exclude some table data - MySQL dump - exclude some table data
猜你喜欢
Use nodejs+express+mongodb to complete the data persistence project (with modified source code)
2022 safety officer-c certificate examination and safety officer-c certificate registration examination
Discussion Net legacy application transformation
Gee calculated area
44. Concurrent programming theory
How to read the source code [debug and observe the source code]
Qtablewidget control of QT
Measurement fitting based on Halcon learning -- Practice [1]
Rhcsa third day notes
如临现场的视觉感染力,NBA决赛直播还能这样看?
随机推荐
Gee calculated area
Such as the visual appeal of the live broadcast of NBA Finals, can you still see it like this?
Global and Chinese market of micro positioning technology 2022-2028: Research Report on technology, participants, trends, market size and share
First knowledge of database
2.6 formula calculation
2022 melting welding and thermal cutting examination materials and free melting welding and thermal cutting examination questions
2.2 integer
Deep search DFS + wide search BFS + traversal of trees and graphs + topological sequence (template article acwing)
Battle drag method 1: moderately optimistic, build self-confidence (1)
Measurement fitting based on Halcon learning -- Practice [1]
Global and Chinese markets of active matrix LCD 2022-2028: Research Report on technology, participants, trends, market size and share
How to read the source code [debug and observe the source code]
In 2021, the global general crop protection revenue was about $52750 million, and it is expected to reach $64730 million in 2028
QT tutorial: signal and slot mechanism
浅议.NET遗留应用改造
The 29th day of force deduction (DP topic)
Use nodejs+express+mongodb to complete the data persistence project (with modified source code)
How to modify the network IP addresses of mobile phones and computers?
Q&A:Transformer, Bert, ELMO, GPT, VIT
Discussion Net legacy application transformation