当前位置:网站首页>Reinforcement learning: from entry to pit to shit
Reinforcement learning: from entry to pit to shit
2022-07-31 04:02:00 【The little reptile in the aviation world】
The information in this article comes from the learning video: Reinforcement Learning Method Summary (Reinforcement Learning) _ beep mile _bilibili
1. What is reinforcement learning
Reinforcement Learning (RL) also has many other names, such as Reinforcement Learning, Reinforcement Learning, and Evaluation Learning. It is one of the paradigms and methodologies of machine learning, which is used to describe and solve the interaction process of the agent with the environment., the problem of learning strategies to maximize returns or achieve specific goals.
What we generally call reinforcement learning is actually deep reinforcement learning (Deep Reinforcement Learning DRL). Deep reinforcement learning is the result of the combination of reinforcement learning and deep learning.As the name implies, it is to use deep learning to complete a certain part of traditional reinforcement learning.

The above picture is a classic reinforcement learning structure diagram. As can be seen from the picture, the reinforcement learning process is mainly composed of four parts: agent, observed state (observation/state), reward (reward) andaction.
In the process of continuously interacting with the environment, the agent will retain the experience learned last time. When interacting with the environment in the next round, it will choose behaviors with greater rewards.choose the best behavior through decision-making".
Differences from other machine learning methods
The other machine learning methods here are mainly supervised learning and unsupervised learning, and it is also where we are most prone to confusion in the process of understanding reinforcement learning.
Supervised learning is the most researched method in the field of machine learning, and it is very mature. In the training set of supervised learning, each sample contains a label. Ideally, this label usually refers tocorrect result.The task of supervised learning is to allow the system to infer the appropriate feedback mechanism on the training set according to the label corresponding to each sample, and then to calculate a result as accurate as possible on samples with unknown labels, such as the familiar classificationwith regression problems.In the interaction problem in reinforcement learning, there is no such a universally correct "label", and the agent can only learn from its own experience.
However, reinforcement learning is not the same as unsupervised learning, which is also unlabeled. Unsupervised learning is to discover hidden structures from unlabeled data sets. A typical example is the clustering problem.However, the goal of reinforcement learning is to maximize the reward rather than finding the hidden data set structure. Although using unsupervised learning to find the internal structure of the data can help the reinforcement learning task, it does not fundamentally solve the problem of maximizing the reward..
Therefore, reinforcement learning is the third paradigm of machine learning besides supervised and unsupervised learning.

2. Classification method
Algorithms

Category 1

Probability-based method selection, not necessarily the one with the highest probability (applicable to continuous action values)
A value-based approach selects the highest value action (more firm decision-making) (does not apply to continuous action values)

Category 2


Category 3


Category 4



边栏推荐
- RESTful api interface design specification
- 已解决:不小心卸载pip后(手动安装pip的两种方式)
- The BP neural network
- 安全20220715
- IDEA common shortcut keys and plug-ins
- (Line segment tree) Summary of common problems of basic line segment tree
- Exsl file preview, word file preview web page method
- How Zotero removes auto-generated tags
- [Swift]自定义点击APP图标弹出的快捷方式
- 递归实现汉诺塔问题
猜你喜欢
![[C language] General method for finding the sum of the greatest common factor and the least common multiple of two integers m and n, the classical solution](/img/60/fa75e06af4d143ee3fb493221fa3d9.jpg)
[C language] General method for finding the sum of the greatest common factor and the least common multiple of two integers m and n, the classical solution
![[C language] General method of expression evaluation](/img/59/cf43b7dd16c203b4f31c1591615955.jpg)
[C language] General method of expression evaluation

How Zotero removes auto-generated tags

《DeepJIT: An End-To-End Deep Learning Framework for Just-In-Time Defect Prediction》论文笔记

pom文件成橘红色未加载的解决方案

How to develop a high-quality test case?

What skills do I need to learn to move from manual testing to automated testing?

Notes on the establishment of the company's official website (6): The public security record of the domain name is carried out and the record number is displayed at the bottom of the web page

立足本土,链接全球 | 施耐德电气“工业SI同盟”携手伙伴共赴未来工业

What is a system?
随机推荐
endian mode
[Swift]自定义点击APP图标弹出的快捷方式
volatile内存语义以及实现 -volatile写和读对普通变量的影响
(tree) Last Common Ancestor (LCA)
What is a system?
The use of beforeDestroy and destroyed
(八)Math 类、Arrays 类、System类、Biglnteger 和 BigDecimal 类、日期类
微软 AI 量化投资平台 Qlib 体验
扫雷游戏(c语言写)
Bubble sort, selection sort, insertion sort, binary search directly
Understanding and Using Unity2D Custom Scriptable Tiles (4) - Start to build a custom tile based on the Tile class (below)
问题1:给你1-10的列表,实现列表输出,单数在左边,双数在右边。
Day32 LeetCode
《DeepJIT: An End-To-End Deep Learning Framework for Just-In-Time Defect Prediction》论文笔记
安全20220722
(6) Enumeration and annotation
Ambiguous method call.both
[C language] Three-pointed chess (classic solution + list diagram)
SocialFi 何以成就 Web3 去中心化社交未来
Safety 20220715