当前位置:网站首页>Tsinghua Bosch joint ml center, thbi lab:cheng Yang Ying | realize safety reinforcement learning through the value at risk of constraints

Tsinghua Bosch joint ml center, thbi lab:cheng Yang Ying | realize safety reinforcement learning through the value at risk of constraints

2022-06-13 00:10:00 Zhiyuan community

【 title 】Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

【 The author team 】Chengyang Ying, Xinning Zhou, Hang Su, Dong Yan, Ning Chen, Jun Zhu

【 Date of publication 】2022.6.9

【 Thesis link 】https://arxiv.org/pdf/2206.04436.pdf

【 Recommended reasons 】 Although deep reinforcement learning (DRL) Great success has been achieved , But due to the inherent uncertainty of transition and observation , It may encounter catastrophic failure . Most of the existing security reinforcement learning methods can only deal with transition interference or observation interference , Because these two kinds of interference affect agent Different parts of ; Besides , Popular worst-case rewards can lead to overly pessimistic strategies . So , In this paper, it is proved theoretically that the performance degradation under transition interference and observation interference depends on a new value function range measure (VFR), It corresponds to the value function difference between the best state and the worst state . Then the conditional value at risk (CVaR) As a risk assessment , A new CVaR Approximate strategy optimization (CPPO) Reinforcement learning algorithm , By way of CVaR Keep at a given threshold , The risk sensitive constrained optimization problem is formalized . Experimental results show that , stay MuJoCo in ,CPPO Gain higher cumulative rewards on a series of continuous control tasks , And it is more robust to observation and transition disturbance .

 

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/163/202206122351177747.html