当前位置:网站首页>Princeton University, Peking University & UIUC | offline reinforcement learning with realizability and single strategy concentration

Princeton University, Peking University & UIUC | offline reinforcement learning with realizability and single strategy concentration

2022-07-06 03:12:00 Zhiyuan community

【 title 】Offline Reinforcement Learning with Realizability and Single-policy Concentrability

【 The author team 】Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee

【 Date of publication 】2022.2.10

【 Thesis link 】https://arxiv.org/pdf/2202.04634.pdf

【 Recommended reasons 】 Offline reinforcement learning (RL) The sample efficiency guarantee usually depends on the function class ( Such as Bellman completeness ) And data coverage ( If all strategies are centralized ) Strong assumptions . Despite recent research to relax these assumptions , But existing jobs can only relax one of these two factors , Strong assumptions about another factor are intact . As an important open question , Whether the sample can be effectively offline under the weak assumption of these two factors RL It's very important , This paper answers this question in a positive way . By analyzing an example based on MDP original - A simple algorithm for dual formula , Where the dual variable ( Discounted occupancy rate ) The density ratio function is used to model the off-line data . Through appropriate regularization, it is proved that the algorithm is only realizable and single policy set , With polynomial sample complexity . The study also provides alternative analysis based on different assumptions , To clarify offline RL Original - Properties of dual algorithm .

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202132331471141.html