当前位置:网站首页>[PPO attitude control] Simulink simulation of UAV attitude control based on reinforcement policy optimization PPO training
[PPO attitude control] Simulink simulation of UAV attitude control based on reinforcement policy optimization PPO training
2022-06-09 06:53:00 【FPGA and MATLAB】
1. Software version
matlab2019b
2. Theoretical knowledge of this algorithm
PPO The algorithm is based on OpenAI Proposed , This algorithm is a new strategy gradient (Policy Gradient) Algorithm , But the traditional strategy gradient algorithm is greatly affected by the step size , And it is difficult to choose the optimal step size , If during training , The difference between the new strategy and the old strategy will affect the final school effect . In response to this question ,PPO The algorithm proposes a new objective function , It can be updated in small batches through multiple training steps , Thus, the problem of step size selection in the traditional strategy gradient algorithm is solved . however PPO Algorithm , Its implementation complexity is much lower than TRPO Algorithm .PPO The implementation of the algorithm mainly includes 2 Type of implementation , The first one is PPO The algorithm is based on CPU Simulated , The second kind PPO The algorithm is based on GPU Accelerate simulation implementation , Its running speed is the first PPO More than three times the algorithm . Compared with the traditional neural network algorithm based on supervised learning, reinforcement learning network , The difficulty lies in the calculation of gradient function , Loss function calculation , however PPO Algorithm in the algorithm complexity , Achieve an optimal balance in terms of accuracy and ease of implementation .

such PPO The algorithm implementation process is relatively simple , It's similar to TRPO The formula of the algorithm , Restrictive operation by parameters .
边栏推荐
- Defi de risk: analyze the systematic risk in the decentralized system
- Chapter_ 02 how to scan and view images, query tables and time metrics in opencv
- 数学很差能学机器学习吗?
- For an experienced software engineer, what would be a preferred new programming language to learn?
- DeFi 去風險:分析去中心化系統中的系統性風險
- [deep learning skill chapter] Chap.1 from perceptron to artificial neural network (ANN)
- 209. 长度最小的子数组
- UML series articles (25) high level behavior - state diagram
- 量化交易之MySql篇 - mysql数据库 增删改查
- 不懂数学可以使用机器学习编程吗?
猜你喜欢

修改IDEA格式化单行注释 后增加空格

Yolov4 analysis | Part 2: training your own data set with yolov4 (super detailed full version)

Ruoyi mind map

UML series articles (21) high level behavior - events and signals

Camtasia 2022发布更新功能介绍

QT---创建对话框1:QDialog的子类查找关键字对话框的实现

UML series article (22) advanced behavior -- state machine

数据库期末考试大纲

UML系列文章(27)體系結構建模---部署

常用类——String类概述
随机推荐
UML系列文章(22)高级行为---状态机
不懂数学可以使用机器学习编程吗?
209. minimum length subarray
市场变化,欢聚集团如何穿越不确定性风暴?
量化交易之MySql篇 - mysql数据库 增删改查
Raspberry pie installation opencv - pro test available
For an experienced software engineer, what would be a preferred new programming language to learn?
关于用户消息的推送
Defi de risk: analyze the systematic risk in the decentralized system
常用类——String类概述
10. 正则表达式匹配
Kotlin 's Null safety
UML series articles (26) architecture modeling -- artifacts
【系统分析师之路】第十八章 复盘系统安全分析与设计(加密与解密)
UML系列文章(24)高级行为---时间和空间
ROS编译报错 genmsg/cmake/genmsg-extras.cmake:307 的解决方法
你真的搞明白了 Dart 中两个对象相等的逻辑了吗?
Yolov4 analysis | Part 2: training your own data set with yolov4 (super detailed full version)
Jump from one pit to another
Matlab: tf2zp与tf2zpk的差异