当前位置：网站首页>RL reinforcement learning summary (1)

RL reinforcement learning summary (1)

2022-08-05 05:03:00 【Times & Beliefs】

Recently, I summarized the knowledge points of reinforcement learning. I listened to Dr. Tang Yudi's course. I will express it in my own words and understanding!!!

1. Overview of Reinforcement Learning

Reinforcement learning, the full name in English is Reinforcement Learning, or RL for short.

Introduction

You must have heard the news that AlphaGo beat the world Go champion.The AlphaGo here uses the reinforcement learning in AI. By learning a lot of chess records in the world, AlphaGo will determine the best choice for each step in chess (select the step with the largest reward value in the current state).

Main mechanism

Reinforcement learning is constantly interacting with the environment. When faced with a choice, after the choice, if the effect is better, it will carry out reward; if the effect is not good, carry out Punishment.Let the model learn with rewards and penalties.When faced with a choice later, choose a choice with a large reward value first, so as to achieve the purpose of continuous learning!

2. Basic concepts of reinforcement learning

Basic Concepts

(1) agent: The Chinese translation is agent, which is the object that will be learned and operated in our model.For example: a car in self-driving.
(2) state: Translated as state at noon, it is the surrounding situation and state of the current agent.For example: when AlphaGo and Li Shishi are playing chess, the position and distribution of the black and white chess pieces that have already fallen on the chessboard at the time of the move; where on the road the self-driving car is at this time.
(3) action: The Chinese translation is action, which is the next step the agent will take in the current state.For example: where on the chessboard the AlphaGo will play; what kind of driving behavior will the self-driving car take at the next moment (go straight, turn left, turn right...)
(4) reward: Chinese translation is reward. Reward includes positive reward, also called reward for short, and negative reward, also called punishment.It is what kind of feedback the current agent will get after taking action.For example: a self-driving car, driving closer and closer to the destination, will be rewarded; if it collides with surrounding buildings, vehicles, etc., it will be punished.By rewarding and punishing, "teach" the agent to learn!!!
(5) policy: The Chinese translation is strategy, which is a series of actions to be taken in order to achieve my ultimate goal, which is called a strategy.

Reinforcement learning process

insert image description here
The agent observes before taking action.At the beginning, you will make different choices. After interacting with the environment (rewarding and punishing), learn to choose the one with the largest reward value.
Observe->Act->Observe
Keep looping...

As shown in the figure above, in short: the agent constantly interacts with the environment, and the environment rewards and punishes the intelligence, thereby changing the state of the agent.
Repeatedly loop, push the agent to move towards the state change (the direction with the larger reward value).
insert image description here