当前位置:网站首页>[PPO attitude control] Simulink simulation of UAV attitude control based on reinforcement policy optimization PPO training

[PPO attitude control] Simulink simulation of UAV attitude control based on reinforcement policy optimization PPO training

2022-06-09 06:53:00 FPGA and MATLAB

1. Software version

matlab2019b

2. Theoretical knowledge of this algorithm

      PPO The algorithm is based on OpenAI Proposed , This algorithm is a new strategy gradient (Policy Gradient) Algorithm , But the traditional strategy gradient algorithm is greatly affected by the step size , And it is difficult to choose the optimal step size , If during training , The difference between the new strategy and the old strategy will affect the final school effect . In response to this question ,PPO The algorithm proposes a new objective function , It can be updated in small batches through multiple training steps , Thus, the problem of step size selection in the traditional strategy gradient algorithm is solved . however PPO Algorithm , Its implementation complexity is much lower than TRPO Algorithm .PPO The implementation of the algorithm mainly includes 2 Type of implementation , The first one is PPO The algorithm is based on CPU Simulated , The second kind PPO The algorithm is based on GPU Accelerate simulation implementation , Its running speed is the first PPO More than three times the algorithm . Compared with the traditional neural network algorithm based on supervised learning, reinforcement learning network , The difficulty lies in the calculation of gradient function , Loss function calculation , however PPO Algorithm in the algorithm complexity , Achieve an optimal balance in terms of accuracy and ease of implementation .

such PPO The algorithm implementation process is relatively simple , It's similar to TRPO The formula of the algorithm , Restrictive operation by parameters .

原网站

版权声明
本文为[FPGA and MATLAB]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/160/202206090640047567.html