当前位置:网站首页>RL reinforcement learning summary (1)
RL reinforcement learning summary (1)
2022-08-05 05:03:00 【Times & Beliefs】
Recently, I summarized the knowledge points of reinforcement learning. I listened to Dr. Tang Yudi's course. I will express it in my own words and understanding!!!
1. Overview of Reinforcement Learning
Reinforcement learning, the full name in English is Reinforcement Learning, or RL for short.
Introduction
You must have heard the news that AlphaGo beat the world Go champion.The AlphaGo here uses the reinforcement learning in AI. By learning a lot of chess records in the world, AlphaGo will determine the best choice for each step in chess (select the step with the largest reward value in the current state).
Main mechanism
Reinforcement learning is constantly interacting with the environment. When faced with a choice, after the choice, if the effect is better, it will carry out reward; if the effect is not good, carry out Punishment.Let the model learn with rewards and penalties.When faced with a choice later, choose a choice with a large reward value first, so as to achieve the purpose of continuous learning!
2. Basic concepts of reinforcement learning
Basic Concepts
(1) agent: The Chinese translation is agent, which is the object that will be learned and operated in our model.For example: a car in self-driving.
(2) state: Translated as state at noon, it is the surrounding situation and state of the current agent.For example: when AlphaGo and Li Shishi are playing chess, the position and distribution of the black and white chess pieces that have already fallen on the chessboard at the time of the move; where on the road the self-driving car is at this time.
(3) action: The Chinese translation is action, which is the next step the agent will take in the current state.For example: where on the chessboard the AlphaGo will play; what kind of driving behavior will the self-driving car take at the next moment (go straight, turn left, turn right...)
(4) reward: Chinese translation is reward. Reward includes positive reward, also called reward for short, and negative reward, also called punishment.It is what kind of feedback the current agent will get after taking action.For example: a self-driving car, driving closer and closer to the destination, will be rewarded; if it collides with surrounding buildings, vehicles, etc., it will be punished.By rewarding and punishing, "teach" the agent to learn!!!
(5) policy: The Chinese translation is strategy, which is a series of actions to be taken in order to achieve my ultimate goal, which is called a strategy.
Reinforcement learning process
The agent observes before taking action.At the beginning, you will make different choices. After interacting with the environment (rewarding and punishing), learn to choose the one with the largest reward value.
Observe->Act->Observe
Keep looping...
As shown in the figure above, in short: the agent constantly interacts with the environment, and the environment rewards and punishes the intelligence, thereby changing the state of the agent.
Repeatedly loop, push the agent to move towards the state change (the direction with the larger reward value).
Example
The car, after taking action (moving left or right), continuously modifies its state (angle and speed of the pole) through incentive measures
边栏推荐
- u-boot in u-boot, dm-pre-reloc
- Qt制作18帧丘比特表白意中人、是你的丘比特嘛!!!
- 请写出SparkSQL语句
- [Surveying] Quick Summary - Excerpt from Gaoshu Gang
- MySQL Foundation (1) - Basic Cognition and Operation
- 使用二维码解决固定资产管理的难题
- Please write the SparkSQL statement
- [8.3] Code Source - [meow ~ meow ~ meow~] [tree] [and]
- About the installation of sklearn library
- LAB 信号量实现细节
猜你喜欢
什么是ASEMI光伏二极管,光伏二极管的作用
Detailed explanation of each module of ansible
MySQL基础(一)---基础认知及操作
Dephi reverse tool Dede exports function name MAP and imports it into IDA
Machine Learning Overview
Excel画图
App rapid development and construction experience: the importance of small programs + custom plug-ins
[BJDCTF2020] EasySearch
Use IDEA to connect to TDengine server
Flutter learning 2-dart learning
随机推荐
服务器磁盘阵列
判断语句_switch与case
[cesium] 3D Tileset model is loaded and associated with the model tree
upload上传图片到腾讯云,如何上传图片
重新审视分布式系统:永远不会有完美的一致性方案……
2023年信息与通信工程国际会议(JCICE 2023)
入口点注入
Requests库部署与常用函数讲解
Cryptography Series: PEM and PKCS7, PKCS8, PKCS12
App rapid development and construction experience: the importance of small programs + custom plug-ins
逆向理论知识4
The solution to the failure to read channel information when dedecms generates a message in the background
MySQL Foundation (1) - Basic Cognition and Operation
说说数据治理中常见的20个问题
【软考 系统架构设计师】软件架构设计③ 特定领域软件架构(DSSA)
Flutter learning - the beginning
使用二维码解决固定资产管理的难题
C++ core programming
dedecms报错The each() function is deprecated
Qt制作18帧丘比特表白意中人、是你的丘比特嘛!!!