当前位置:网站首页>University of Electronic Science and technology | playback of clustering experience effectively used in reinforcement learning
University of Electronic Science and technology | playback of clustering experience effectively used in reinforcement learning
2022-07-03 20:42:00 【Zhiyuan community】
【 title 】Clustering Experience Replay for the Effective Exploitation in Reinforcement Learning
【 The author team 】Min Li, Tianyi Huang, William Zhu
【 Date of publication 】2022.6.27
【 Thesis link 】https://www.sciencedirect.com/science/article/pii/S0031320322003569
【 Recommended reasons 】 Reinforcement learning trains agents to make decisions by using the transformation experience generated by different decisions . In order to take advantage of this experience , Most reinforcement learning methods pass Replay the explored conversion through unified sampling . But in this way , It's easy to ignore the transformation of the final exploration . Another way to use this experience is to define the priority of each transformation through the estimation error in training , Then replay the conversion according to their priority . But it only updates the priority of the conversion replayed at the current training time step , Therefore, the conversion with lower priority will be ignored . This paper proposes a clustering experience playback , be called CER, Effectively use the experience hidden in all the transitions explored in the current training .CER The transformation is clustered and replayed through the divide and conquer framework based on time division . First , It divides the whole training process into several stages . secondly , At the end of each phase , It USES k-means Cluster the transitions explored at this stage . Last , It constructs a conditional probability density function , To ensure that various transitions can be fully replayed in the current training .
边栏推荐
- Operate BOM objects (key)
- How to set the system volume programmatically- How to programmatically set the system volume?
- An old programmer gave it to college students
- P5.js development - setting
- jvm jni 及 pvm pybind11 大批量数据传输及优化
- JVM JNI and PVM pybind11 mass data transmission and optimization
- Global and Chinese market of micro positioning technology 2022-2028: Research Report on technology, participants, trends, market size and share
- 2.7 format output of values
- Global and Chinese markets for medical temperature sensors 2022-2028: Research Report on technology, participants, trends, market size and share
- Measurement fitting based on Halcon learning -- Practice [1]
猜你喜欢

Qt6 QML Book/Qt Quick 3D/基础知识

Change deepin to Alibaba image source

2.5 conversion of different data types (2)

thrift go

How to do Taobao full screen rotation code? Taobao rotation tmall full screen rotation code

Commands related to files and directories

Rhcsa third day notes

Upgrade PIP and install Libraries

Line segment tree blue book explanation + classic example acwing 1275 Maximum number

Example of peanut shell inner net penetration
随机推荐
Design e-commerce seckill system
Global and Chinese market of high purity copper foil 2022-2028: Research Report on technology, participants, trends, market size and share
《ActBERT》百度&悉尼科技大学提出ActBERT,学习全局局部视频文本表示,在五个视频-文本任务中有效!...
全网都在疯传的《老板管理手册》(转)
18、 MySQL -- index
How to read the source code [debug and observe the source code]
Viewing Chinese science and technology from the Winter Olympics (II): when snowmaking breakthrough is in progress
Get log4net log file in C - get log4net log file in C
Fingerprint password lock based on Hal Library
The 12th Blue Bridge Cup
2.7 format output of values
【c】 Digital bomb
C 10 new feature [caller parameter expression] solves my confusion seven years ago
An old programmer gave it to college students
不同业务场景该如何选择缓存的读写策略?
6006. Take out the minimum number of magic beans
Golang type assertion and conversion (and strconv package)
[postgresql]postgresql custom function returns an instance of table type
LabVIEW training
浅议.NET遗留应用改造