当前位置:网站首页>University of Electronic Science and technology | playback of clustering experience effectively used in reinforcement learning
University of Electronic Science and technology | playback of clustering experience effectively used in reinforcement learning
2022-07-03 20:42:00 【Zhiyuan community】
【 title 】Clustering Experience Replay for the Effective Exploitation in Reinforcement Learning
【 The author team 】Min Li, Tianyi Huang, William Zhu
【 Date of publication 】2022.6.27
【 Thesis link 】https://www.sciencedirect.com/science/article/pii/S0031320322003569
【 Recommended reasons 】 Reinforcement learning trains agents to make decisions by using the transformation experience generated by different decisions . In order to take advantage of this experience , Most reinforcement learning methods pass Replay the explored conversion through unified sampling . But in this way , It's easy to ignore the transformation of the final exploration . Another way to use this experience is to define the priority of each transformation through the estimation error in training , Then replay the conversion according to their priority . But it only updates the priority of the conversion replayed at the current training time step , Therefore, the conversion with lower priority will be ignored . This paper proposes a clustering experience playback , be called CER, Effectively use the experience hidden in all the transitions explored in the current training .CER The transformation is clustered and replayed through the divide and conquer framework based on time division . First , It divides the whole training process into several stages . secondly , At the end of each phase , It USES k-means Cluster the transitions explored at this stage . Last , It constructs a conditional probability density function , To ensure that various transitions can be fully replayed in the current training .
边栏推荐
- Producer consumer mode (multithreading, use of shared resources)
- Discussion Net legacy application transformation
- jvm jni 及 pvm pybind11 大批量数据传输及优化
- Rhcsa third day operation
- In 2021, the global revenue of syphilis rapid detection kits was about US $608.1 million, and it is expected to reach US $712.9 million in 2028
- 2166. Design bit set
- 2.2 integer
- [postgresql]postgresql custom function returns an instance of table type
- Strange way of expressing integers (expanding Chinese remainder theorem)
- TLS environment construction and plaintext analysis
猜你喜欢

1.4 learn more about functions

浅议.NET遗留应用改造

The global industrial design revenue in 2021 was about $44360 million, and it is expected to reach $62720 million in 2028. From 2022 to 2028, the CAGR was 5.5%

19、 MySQL -- SQL statements and queries

Design e-commerce seckill system

一台服务器最大并发 tcp 连接数多少?65535?

Camera calibration (I): robot hand eye calibration

JVM JNI and PVM pybind11 mass data transmission and optimization

Do you really know how old you are?

Sparse matrix (triple) creation, transpose, traversal, addition, subtraction, multiplication. C implementation
随机推荐
Use nodejs+express+mongodb to complete the data persistence project (with modified source code)
Commands related to files and directories
Exercises of function recursion
Battle drag method 1: moderately optimistic, build self-confidence (1)
Kubernetes 通信异常网络故障 解决思路
Node MySQL serialize cannot rollback transactions
Phpexcel import export
Assign the CMD command execution result to a variable
MDM mass data synchronization test verification
Qt6 QML Book/Qt Quick 3D/基础知识
阻塞非阻塞和同步异步的区分 参考一些书籍
【c】 Digital bomb
Use of CMD command
Such as the visual appeal of the live broadcast of NBA Finals, can you still see it like this?
JVM JNI and PVM pybind11 mass data transmission and optimization
同花顺开户注册安全靠谱吗?有没有风险的?
Global and Chinese markets for medical temperature sensors 2022-2028: Research Report on technology, participants, trends, market size and share
全网都在疯传的《老板管理手册》(转)
For in, foreach, for of
强化学习-学习笔记1 | 基础概念