当前位置:网站首页>RL reinforcement learning summary (1)
RL reinforcement learning summary (1)
2022-08-05 05:03:00 【Times & Beliefs】
Recently, I summarized the knowledge points of reinforcement learning. I listened to Dr. Tang Yudi's course. I will express it in my own words and understanding!!!
1. Overview of Reinforcement Learning
Reinforcement learning, the full name in English is Reinforcement Learning, or RL for short.
Introduction
You must have heard the news that AlphaGo beat the world Go champion.The AlphaGo here uses the reinforcement learning in AI. By learning a lot of chess records in the world, AlphaGo will determine the best choice for each step in chess (select the step with the largest reward value in the current state).
Main mechanism
Reinforcement learning is constantly interacting with the environment. When faced with a choice, after the choice, if the effect is better, it will carry out reward; if the effect is not good, carry out Punishment.Let the model learn with rewards and penalties.When faced with a choice later, choose a choice with a large reward value first, so as to achieve the purpose of continuous learning!
2. Basic concepts of reinforcement learning
Basic Concepts
(1) agent: The Chinese translation is agent, which is the object that will be learned and operated in our model.For example: a car in self-driving.
(2) state: Translated as state at noon, it is the surrounding situation and state of the current agent.For example: when AlphaGo and Li Shishi are playing chess, the position and distribution of the black and white chess pieces that have already fallen on the chessboard at the time of the move; where on the road the self-driving car is at this time.
(3) action: The Chinese translation is action, which is the next step the agent will take in the current state.For example: where on the chessboard the AlphaGo will play; what kind of driving behavior will the self-driving car take at the next moment (go straight, turn left, turn right...)
(4) reward: Chinese translation is reward. Reward includes positive reward, also called reward for short, and negative reward, also called punishment.It is what kind of feedback the current agent will get after taking action.For example: a self-driving car, driving closer and closer to the destination, will be rewarded; if it collides with surrounding buildings, vehicles, etc., it will be punished.By rewarding and punishing, "teach" the agent to learn!!!
(5) policy: The Chinese translation is strategy, which is a series of actions to be taken in order to achieve my ultimate goal, which is called a strategy.
Reinforcement learning process
The agent observes before taking action.At the beginning, you will make different choices. After interacting with the environment (rewarding and punishing), learn to choose the one with the largest reward value.
Observe->Act->Observe
Keep looping...
As shown in the figure above, in short: the agent constantly interacts with the environment, and the environment rewards and punishes the intelligence, thereby changing the state of the agent.
Repeatedly loop, push the agent to move towards the state change (the direction with the larger reward value).
Example
The car, after taking action (moving left or right), continuously modifies its state (angle and speed of the pole) through incentive measures
边栏推荐
- LeetCode:1403. 非递增顺序的最小子序列【贪心】
- 【微信小程序】WXML模板语法-条件渲染
- ansible各个模块详解
- App快速开发建设心得:小程序+自定义插件的重要性
- 【cesium】加载并定位 3D Tileset
- Application status of digital twin technology in power system
- 说说数据治理中常见的20个问题
- The production method of the powered small sailboat is simple, the production method of the electric small sailboat
- human weakness
- Flutter TapGestureRecognizer 如何工作
猜你喜欢
Flutter真机运行及模拟器运行
LeetCode:1403. 非递增顺序的最小子序列【贪心】
upload upload pictures to Tencent cloud, how to upload pictures
u-boot调试定位手段
Day019 方法重写与相关类的介绍
Excel画图
雷克萨斯lm的安全性到底体现在哪里?一起来看看吧
The production method of the powered small sailboat is simple, the production method of the electric small sailboat
[cesium] 3D Tileset model is loaded and associated with the model tree
狗仔队:表面编辑多视点图像处理
随机推荐
请写出SparkSQL语句
entry point injection
2023年信息与通信工程国际会议(JCICE 2023)
虚证、实证如何鉴别?
RL强化学习总结(一)
Day14 jenkins deployment
Homework 8.4 Interprocess Communication Pipes and Signals
Bytebuffer put flip compact clear method demonstration
【cesium】3D Tileset 模型加载并与模型树关联
Day14 jenkins部署
Mvi架构浅析
upload upload pictures to Tencent cloud, how to upload pictures
App快速开发建设心得:小程序+自定义插件的重要性
[8.3] Code Source - [meow ~ meow ~ meow~] [tree] [and]
The difference between span tag and p
电话溥功能
The role of DataContext in WPF
【cesium】Load and locate 3D Tileset
Use IDEA to connect to TDengine server
[Nine Lectures on Backpacks - 01 Backpack Problems]