当前位置:网站首页>Reinforcement learning: from entry to pit to shit
Reinforcement learning: from entry to pit to shit
2022-07-31 04:02:00 【The little reptile in the aviation world】
The information in this article comes from the learning video: Reinforcement Learning Method Summary (Reinforcement Learning) _ beep mile _bilibili
1. What is reinforcement learning
Reinforcement Learning (RL) also has many other names, such as Reinforcement Learning, Reinforcement Learning, and Evaluation Learning. It is one of the paradigms and methodologies of machine learning, which is used to describe and solve the interaction process of the agent with the environment., the problem of learning strategies to maximize returns or achieve specific goals.
What we generally call reinforcement learning is actually deep reinforcement learning (Deep Reinforcement Learning DRL). Deep reinforcement learning is the result of the combination of reinforcement learning and deep learning.As the name implies, it is to use deep learning to complete a certain part of traditional reinforcement learning.
The above picture is a classic reinforcement learning structure diagram. As can be seen from the picture, the reinforcement learning process is mainly composed of four parts: agent, observed state (observation/state), reward (reward) andaction.
In the process of continuously interacting with the environment, the agent will retain the experience learned last time. When interacting with the environment in the next round, it will choose behaviors with greater rewards.choose the best behavior through decision-making".
Differences from other machine learning methods
The other machine learning methods here are mainly supervised learning and unsupervised learning, and it is also where we are most prone to confusion in the process of understanding reinforcement learning.
Supervised learning is the most researched method in the field of machine learning, and it is very mature. In the training set of supervised learning, each sample contains a label. Ideally, this label usually refers tocorrect result.The task of supervised learning is to allow the system to infer the appropriate feedback mechanism on the training set according to the label corresponding to each sample, and then to calculate a result as accurate as possible on samples with unknown labels, such as the familiar classificationwith regression problems.In the interaction problem in reinforcement learning, there is no such a universally correct "label", and the agent can only learn from its own experience.
However, reinforcement learning is not the same as unsupervised learning, which is also unlabeled. Unsupervised learning is to discover hidden structures from unlabeled data sets. A typical example is the clustering problem.However, the goal of reinforcement learning is to maximize the reward rather than finding the hidden data set structure. Although using unsupervised learning to find the internal structure of the data can help the reinforcement learning task, it does not fundamentally solve the problem of maximizing the reward..
Therefore, reinforcement learning is the third paradigm of machine learning besides supervised and unsupervised learning.
2. Classification method
Algorithms
Category 1
Probability-based method selection, not necessarily the one with the highest probability (applicable to continuous action values)
A value-based approach selects the highest value action (more firm decision-making) (does not apply to continuous action values)
Category 2
Category 3
Category 4
边栏推荐
- Difference between unallocated blocks and unused blocks in database files
- Detailed explanation of TCP (2)
- $attrs/$listeners
- [C language] Preprocessing operation
- 安全20220718
- idea工程明明有依赖但是文件就是显示没有,Cannot resolve symbol ‘XXX‘
- SQL Interview Questions (Key Points)
- log level and print log note
- The els block moves the boundary to the right, and accelerates downward.
- 浅识Flutter 基本组件之CheckboxListTile组件
猜你喜欢
Know the showTimePicker method of the basic components of Flutter
endian mode
Port inspection steps - 7680 port analysis - Dosvc service
[C language] General method for finding the sum of the greatest common factor and the least common multiple of two integers m and n, the classical solution
[Swift] Customize the shortcut that pops up by clicking the APP icon
Win10 CUDA CUDNN 安装配置(torch paddlepaddle)
$attrs/$listeners
立足本土,链接全球 | 施耐德电气“工业SI同盟”携手伙伴共赴未来工业
Just debuted "Fight to Fame", safety and comfort are not lost
VS QT - ui does not display newly added members (controls) || code is silent
随机推荐
Can‘t load /home/Iot/.rnd into RNG
【论文阅读】Mastering the game of Go with deep neural networks and tree search
[C language] General method of expression evaluation
Redis implements distributed locks
Unity2D 自定义Scriptable Tiles的理解与使用(四)——开始着手构建一个基于Tile类的自定义tile(下)
Safety 20220712
(tree) Last Common Ancestor (LCA)
Regarding the primary key id in the mysql8.0 database, when the id is inserted using replace to be 0, the actual id is automatically incremented after insertion, resulting in the solution to the repea
Based on the local, linking the world | Schneider Electric "Industrial SI Alliance" joins hands with partners to go to the future industry
No qualifying bean of type question
C语言从入门到如土——数据的存储
postgresql 15源码浅析(5)—— pg_control
安全20220715
Postgresql 15 source code analysis (5) - pg_control
Redis 使用LIST做最新评论缓存
[AUTOSAR-RTE]-5-Explicit (explicit) and Implicit (implicit) Sender-Receiver communication
LocalDate加减操作及比较大小
LeetCode每日一练 —— OR36 链表的回文结构
Redis uses LIST to cache the latest comments
浅识Flutter 基本组件之CheckboxListTile组件