当前位置:网站首页>Microsoft Research, UIUC & Google research | antagonistic training actor critic based on offline training reinforcement learning

Microsoft Research, UIUC & Google research | antagonistic training actor critic based on offline training reinforcement learning

2022-07-06 03:12:00 Zhiyuan community

【 title 】Adversarially Trained Actor Critic for Offline Reinforcement Learning

【 The author team 】Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal

【 Date of publication 】2022.2.5

【 Thesis link 】https://arxiv.org/pdf/2202.02446.pdf

【 Recommended reasons 】 This paper proposes Adversarially Trained Actor Critic (ATAC), This is a new model free algorithm , It is used for offline reinforcement learning when the data coverage is insufficient , Based on offline RL The two of you Stackelberg Game framework : Strategy participants compete with value critics trained in confrontation , The latter found that participants were not as consistent as the data of the data collection behavior strategy . Studies have shown that , When participants have no regrets in the two person game , function ATAC The resulting strategy can prove : 1) It is superior to behavior strategy in a wide range of super parameters , as well as 2) The selected hyperparameters compete with the best strategy of data coverage in an appropriate way . Compared with existing studies , It is worth noting that , This framework not only provides a theoretical guarantee for general function approximation , It is also scalable to complex environments and large data sets RL Implementation provides a guarantee . stay D4RL In benchmarking ,ATAC It is always superior to the most advanced offline system in a series of continuous control tasks RL Algorithm .

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202132331471110.html