当前位置:网站首页>RL reinforcement learning summary (1)
RL reinforcement learning summary (1)
2022-08-05 05:03:00 【Times & Beliefs】
Recently, I summarized the knowledge points of reinforcement learning. I listened to Dr. Tang Yudi's course. I will express it in my own words and understanding!!!
1. Overview of Reinforcement Learning
Reinforcement learning, the full name in English is Reinforcement Learning, or RL for short.
Introduction
You must have heard the news that AlphaGo beat the world Go champion.The AlphaGo here uses the reinforcement learning in AI. By learning a lot of chess records in the world, AlphaGo will determine the best choice for each step in chess (select the step with the largest reward value in the current state).
Main mechanism
Reinforcement learning is constantly interacting with the environment. When faced with a choice, after the choice, if the effect is better, it will carry out reward; if the effect is not good, carry out Punishment.Let the model learn with rewards and penalties.When faced with a choice later, choose a choice with a large reward value first, so as to achieve the purpose of continuous learning!
2. Basic concepts of reinforcement learning
Basic Concepts
(1) agent: The Chinese translation is agent, which is the object that will be learned and operated in our model.For example: a car in self-driving.
(2) state: Translated as state at noon, it is the surrounding situation and state of the current agent.For example: when AlphaGo and Li Shishi are playing chess, the position and distribution of the black and white chess pieces that have already fallen on the chessboard at the time of the move; where on the road the self-driving car is at this time.
(3) action: The Chinese translation is action, which is the next step the agent will take in the current state.For example: where on the chessboard the AlphaGo will play; what kind of driving behavior will the self-driving car take at the next moment (go straight, turn left, turn right...)
(4) reward: Chinese translation is reward. Reward includes positive reward, also called reward for short, and negative reward, also called punishment.It is what kind of feedback the current agent will get after taking action.For example: a self-driving car, driving closer and closer to the destination, will be rewarded; if it collides with surrounding buildings, vehicles, etc., it will be punished.By rewarding and punishing, "teach" the agent to learn!!!
(5) policy: The Chinese translation is strategy, which is a series of actions to be taken in order to achieve my ultimate goal, which is called a strategy.
Reinforcement learning process
The agent observes before taking action.At the beginning, you will make different choices. After interacting with the environment (rewarding and punishing), learn to choose the one with the largest reward value.
Observe->Act->Observe
Keep looping...
As shown in the figure above, in short: the agent constantly interacts with the environment, and the environment rewards and punishes the intelligence, thereby changing the state of the agent.
Repeatedly loop, push the agent to move towards the state change (the direction with the larger reward value).
Example
The car, after taking action (moving left or right), continuously modifies its state (angle and speed of the pole) through incentive measures
边栏推荐
猜你喜欢
随机推荐
software management rpm
NPDP证书含金量高吗?跟PMP相比?
【cesium】3D Tileset 模型加载并与模型树关联
uboot开启调试打印信息
Visibility of multi-column attribute column elements: display, visibility, opacity, vertical alignment: vertical-align, z-index The larger it is, the more it will be displayed on the upper layer
Dephi reverse tool Dede exports function name MAP and imports it into IDA
特征预处理
为什么刚考完PMP,就开始准备软考了?
【cesium】加载并定位 3D Tileset
RL强化学习总结(一)
mutillidae download and installation
[8.3] Code Source - [meow ~ meow ~ meow~] [tree] [and]
Detailed explanation of Mysql's undo log
Detailed explanation of each module of ansible
write the story about us
The first performance test practice, there are "100 million" a little nervous
for..in和for..of的区别
Structured light 3D reconstruction (1) Striped structured light 3D reconstruction
【学生毕业设计】基于web学生信息管理系统网站的设计与实现(13个页面)
WPF中DataContext作用