当前位置:网站首页>ASU & OSU | model based regularized off-line meta reinforcement learning

ASU & OSU | model based regularized off-line meta reinforcement learning

2022-07-06 03:12:00 Zhiyuan community

【 title 】Model-Based Offline Meta-Reinforcement Learning with Regularization

【 The author team 】Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang

【 Date of publication 】2022.2.7

【 Thesis link 】https://arxiv.org/pdf/2202.02929.pdf

【 Recommended reasons 】 Existing offline reinforcement learning (RL) The approach faces some major challenges , In particular, the distribution changes between learning strategies and behavioral strategies . Offline element RL It is becoming a promising way to solve these challenges , It aims to learn informative meta strategies from a series of tasks . However , As the study shows , On the task of good data set quality , Offline element RL The method may be better than offline single task RL Method . Based on this , This paper explores a model-based off-line meta model RL And regularization strategy optimization (MerPO), It learns a metamodel , For effective task structure reasoning , And an information element strategy , Actions for safely exploring out of distribution states . This paper designs a new role based on meta regularization model - criticism (RAC) Method , Used for intra task policy optimization , As MerPO Key building blocks , Use conservative strategy evaluation and regularization strategy to improve ; The inherent tradeoff is achieved by striking an appropriate balance between the two regularizers , They are behavior based strategy and meta strategy . The study proves theoretically that , Learning strategies have guaranteed improvements over behavioral strategies and meta strategies , So as to ensure that the offline meta RL Improve the performance of new tasks . The experiment confirmed that MerPO Better than existing offline Meta-RL The superior performance of the method .

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202132331471079.html