当前位置:网站首页>1. Strengthen learning basic knowledge points
1. Strengthen learning basic knowledge points
2022-07-05 20:20:00 【C--G】
Knowledge supplement of probability theory
Random Variable
Coin tossing is a random event , The probability of facing up and facing up is 0.5, Usually use X Represents a random variable ,x Represents the observed value 
Probability Density Function (PDF)
The probability density function means the probability that a random variable is attached to a certain value point
Gaussian distribution 
Discrete probability distribution 
If the probability density function is continuous , Then the sum of the function integral is 1, The sum of all values of discrete type is 1
Random Sampling
Random sampling 

Strengthen the foundation of learning
Strengthen learning concept nouns

state: state
action: action
agent: agent 
policy: Strategy ( Probability density function )
The probability of each action , Use random strategies , More realistic , It is not easy to see the law 
reward: Reward
Rewards should be set according to the actual situation , Such as : Get a gold coin reward +1, The game rewards +10000, Mary eliminated the Award -10000, Nothing happened. The reward is 0, The purpose of reinforcement learning is to improve the rewards obtained 
state transition: State shift
State transitions are random , The state transition probability density function is known only by the environment , Players don't know
brief introduction

agent take action,environment Of state Change and return reward to agent,agent according to reward To study
- The source of randomness in reinforcement learning
action The randomness of
state The randomness of
- AI How to play games


Observe state s1,Agent utilize policy Function execution action a1,environment Generate a new state s2 And returned reward r1 to agent ,agent To use again policy Function execution action a2...... Loop the operation - Rewards and Returns

- Return
return: Return , That is, the cumulative rewards in the future
Ut from Rt By the end of the game Rn Cumulative income . At present reword It should be better than later reword Great power , such as : Today's 80 Yuan than tomorrow 100 Yuan is practical 
y: Discount report , Be situated between 0-1
- Randomness of reporting



t moment return Depending on t To n The moment reward,reward Depends on state and action, therefore return It also depends on state and action
- Value Function
action-value function—— Action value function 

about Ut for ,St and At It's observable ,St+1——Sn, and At+1——An It's a random variable 
St+1 Probability and St,At of ,At+1 Probability and St+1 of 
state-value function—— State value function 



- Ai control the agent

Π(a|s) Strategy learning function , stay state Optimal in case action,Q(s,a) Calculate the score of each action , Choose the best *
Evaluate reinforcement learning
OpenAI Gym



summary

边栏推荐
- Unity editor extended UI control
- Is it safe for Galaxy Securities to open an account online?
- Go language | 03 array, pointer, slice usage
- kubernetes资源对象介绍及常用命令(五)-(ConfigMap&Secret)
- C language OJ gets PE, OJ of ACM introduction~
- ICTCLAS word Lucene 4.9 binding
- 全国爱眼教育大会,2022第四届北京国际青少年眼健康产业展会
- 【数字IC验证快速入门】6、Questasim 快速上手使用(以全加器设计与验证为例)
- Schema和Model
- . Net distributed transaction and landing solution
猜你喜欢

计算lnx的一种方式

微信小程序正则表达式提取链接

Go language | 02 for loop and the use of common functions

解决Thinkphp框架应用目录下数据库配置信息修改后依然按默认方式连接

. Net distributed transaction and landing solution
![[quick start of Digital IC Verification] 7. Basic knowledge of digital circuits necessary for verification positions (including common interview questions)](/img/90/aad9d7900d686efca10140717a5c5c.png)
[quick start of Digital IC Verification] 7. Basic knowledge of digital circuits necessary for verification positions (including common interview questions)

Unity editor extended UI control

Convolution free backbone network: Pyramid transformer to improve the accuracy of target detection / segmentation and other tasks (with source code)

Wechat applet regular expression extraction link

基础篇——配置文件解析
随机推荐
信息学奥赛一本通 1337:【例3-2】单词查找树 | 洛谷 P5755 [NOI2000] 单词查找树
Oracle-表空间管理
A solution to PHP's inability to convert strings into JSON
Guidelines for application of Shenzhen green and low carbon industry support plan in 2023
解决Thinkphp框架应用目录下数据库配置信息修改后依然按默认方式连接
1:引文;
leetcode刷题:二叉树14(左叶子之和)
C langue OJ obtenir PE, ACM démarrer OJ
Notes on key vocabulary in the English original of the biography of jobs (12) [chapter ten & eleven]
2023年深圳市绿色低碳产业扶持计划申报指南
【数字IC验证快速入门】3、数字IC设计全流程介绍
leetcode刷题:二叉树15(找树左下角的值)
Go language | 03 array, pointer, slice usage
实操演示:产研团队如何高效构建需求工作流?
About the priority of Bram IP reset
点云文件的.dat文件读取保存
Oracle tablespace management
ByteDance dev better technology salon was successfully held, and we joined hands with Huatai to share our experience in improving the efficiency of web research and development
解决php无法将string转换为json的办法
Summer Challenge harmonyos - realize message notification function