当前位置:网站首页>电子科技大学|强化学习中有效利用的聚类经验回放
电子科技大学|强化学习中有效利用的聚类经验回放
2022-07-03 20:31:00 【智源社区】
【标题】Clustering Experience Replay for the Effective Exploitation in Reinforcement Learning
【作者团队】Min Li, Tianyi Huang, William Zhu
【发表日期】2022.6.27
【论文链接】https://www.sciencedirect.com/science/article/pii/S0031320322003569
【推荐理由】强化学习通过利用不同决策产生的转换经验来训练智能体做出决策。为了利用这种经验,大多数强化学习方法通过统一采样重放探索过的转换。但是通过这种方式,很容易忽略最后探索的转换。利用这种经验的另一种方法是通过训练中的估计误差来定义每个转换的优先级,然后根据它们的优先级重放转换。但它只更新在当前训练时间步长重播的转换的优先级,因此优先级较低的转换将被忽略。本文提出了一种聚类体验回放,称为 CER,有效地利用隐藏在当前培训中所有探索过的过渡中的经验。CER 通过基于时间划分的分治框架对转换进行聚类和重放。首先,它将整个训练过程分为几个阶段。其次,在每个阶段结束时,它使用k-means对该阶段探索的过渡进行聚类。最后,它构造了一个条件概率密度函数,以确保在当前训练中能够充分地重播各种转换。
边栏推荐
猜你喜欢

强基计划 数学相关书籍 推荐

jvm jni 及 pvm pybind11 大批量数据传输及优化

1.5 learn to find mistakes first

IP address is such an important knowledge that it's useless to listen to a younger student?

Viewing Chinese science and technology from the Winter Olympics (II): when snowmaking breakthrough is in progress

Exercises of function recursion
![Measurement fitting based on Halcon learning -- Practice [1]](/img/71/9f6c27aa89035b2550bdb0ac902045.jpg)
Measurement fitting based on Halcon learning -- Practice [1]

2.5 conversion of different data types (2)
![How to read the source code [debug and observe the source code]](/img/0d/6495c5da40ed1282803b25746a3f29.jpg)
How to read the source code [debug and observe the source code]

Battle drag method 1: moderately optimistic, build self-confidence (1)
随机推荐
Q&A:Transformer, Bert, ELMO, GPT, VIT
浅议.NET遗留应用改造
【c】 Digital bomb
Global and Chinese markets of lithium chloride 2022-2028: Research Report on technology, participants, trends, market size and share
Microservice knowledge sorting - search technology and automatic deployment technology
PR notes:
Promethus
Upgrade PIP and install Libraries
Use of CMD command
Discussion Net legacy application transformation
How to handle wechat circle of friends marketing activities and share production and release skills
Parental delegation mechanism
Recommendation of books related to strong foundation program mathematics
Global and Chinese markets for medical temperature sensors 2022-2028: Research Report on technology, participants, trends, market size and share
How can the outside world get values when using nodejs to link MySQL
设计电商秒杀系统
Analysis of gas fee setting under eip1559
Basic command of IP address configuration ---ip V4
不同业务场景该如何选择缓存的读写策略?
Thread, thread stack, method stack, the difference of creating thread