当前位置:网站首页>1. Strengthen learning basic knowledge points
1. Strengthen learning basic knowledge points
2022-07-05 20:20:00 【C--G】
Knowledge supplement of probability theory
Random Variable
Coin tossing is a random event , The probability of facing up and facing up is 0.5, Usually use X Represents a random variable ,x Represents the observed value 
Probability Density Function (PDF)
The probability density function means the probability that a random variable is attached to a certain value point
Gaussian distribution 
Discrete probability distribution 
If the probability density function is continuous , Then the sum of the function integral is 1, The sum of all values of discrete type is 1
Random Sampling
Random sampling 

Strengthen the foundation of learning
Strengthen learning concept nouns

state: state
action: action
agent: agent 
policy: Strategy ( Probability density function )
The probability of each action , Use random strategies , More realistic , It is not easy to see the law 
reward: Reward
Rewards should be set according to the actual situation , Such as : Get a gold coin reward +1, The game rewards +10000, Mary eliminated the Award -10000, Nothing happened. The reward is 0, The purpose of reinforcement learning is to improve the rewards obtained 
state transition: State shift
State transitions are random , The state transition probability density function is known only by the environment , Players don't know
brief introduction

agent take action,environment Of state Change and return reward to agent,agent according to reward To study
- The source of randomness in reinforcement learning
action The randomness of
state The randomness of
- AI How to play games


Observe state s1,Agent utilize policy Function execution action a1,environment Generate a new state s2 And returned reward r1 to agent ,agent To use again policy Function execution action a2...... Loop the operation - Rewards and Returns

- Return
return: Return , That is, the cumulative rewards in the future
Ut from Rt By the end of the game Rn Cumulative income . At present reword It should be better than later reword Great power , such as : Today's 80 Yuan than tomorrow 100 Yuan is practical 
y: Discount report , Be situated between 0-1
- Randomness of reporting



t moment return Depending on t To n The moment reward,reward Depends on state and action, therefore return It also depends on state and action
- Value Function
action-value function—— Action value function 

about Ut for ,St and At It's observable ,St+1——Sn, and At+1——An It's a random variable 
St+1 Probability and St,At of ,At+1 Probability and St+1 of 
state-value function—— State value function 



- Ai control the agent

Π(a|s) Strategy learning function , stay state Optimal in case action,Q(s,a) Calculate the score of each action , Choose the best *
Evaluate reinforcement learning
OpenAI Gym



summary

边栏推荐
- C langue OJ obtenir PE, ACM démarrer OJ
- 点云文件的.dat文件读取保存
- - Oui. Net Distributed Transaction and Landing Solution
- leetcode刷题:二叉树13(相同的树)
- 1: Citation;
- [quick start of Digital IC Verification] 6. Quick start of questasim (taking the design and verification of full adder as an example)
- PyTorch 1.12发布,正式支持苹果M1芯片GPU加速,修复众多Bug
- 解决Thinkphp框架应用目录下数据库配置信息修改后依然按默认方式连接
- 实操演示:产研团队如何高效构建需求工作流?
- Oracle-表空间管理
猜你喜欢

Scala基础【HelloWorld代码解析,变量和标识符】

解决Thinkphp框架应用目录下数据库配置信息修改后依然按默认方式连接

leetcode刷题:二叉树16(路径总和)

Go language | 02 for loop and the use of common functions

Wechat applet regular expression extraction link

Convolution free backbone network: Pyramid transformer to improve the accuracy of target detection / segmentation and other tasks (with source code)

Rainbond 5.7.1 支持对接多家公有云和集群异常报警

Database logic processing function

Leetcode skimming: binary tree 10 (number of nodes of a complete binary tree)

Solve the problem that the database configuration information under the ThinkPHP framework application directory is still connected by default after modification
随机推荐
July 4, 2022 - July 10, 2022 (UE4 video tutorial MySQL)
Sort and projection
USACO3.4 “破锣摇滚”乐队 Raucous Rockers - DP
如何形成规范的接口文档
Some problems encountered in cocos2d-x project summary
leetcode刷题:二叉树12(二叉树的所有路径)
【数字IC验证快速入门】9、Verilog RTL设计必会的有限状态机(FSM)
[C language] merge sort
Is it safe for CICC fortune to open an account online?
CCPC 2021威海 - G. Shinyruo and KFC(组合数,小技巧)
C langue OJ obtenir PE, ACM démarrer OJ
[quick start to digital IC Verification] 8. Typical circuits in digital ICs and their corresponding Verilog description methods
sun.misc.BASE64Encoder报错解决方法[通俗易懂]
小程序页面导航
Leetcode skimming: binary tree 17 (construct binary tree from middle order and post order traversal sequence)
How to choose a good external disk platform, safe and formal?
信息学奥赛一本通 1339:【例3-4】求后序遍历 | 洛谷 P1827 [USACO3.4] 美国血统 American Heritage
.Net分布式事務及落地解决方案
leetcode刷题:二叉树16(路径总和)
Leetcode(347)——前 K 个高频元素