当前位置:网站首页>1. Strengthen learning basic knowledge points
1. Strengthen learning basic knowledge points
2022-07-05 20:20:00 【C--G】
Knowledge supplement of probability theory
Random Variable
Coin tossing is a random event , The probability of facing up and facing up is 0.5, Usually use X Represents a random variable ,x Represents the observed value 
Probability Density Function (PDF)
The probability density function means the probability that a random variable is attached to a certain value point
Gaussian distribution 
Discrete probability distribution 
If the probability density function is continuous , Then the sum of the function integral is 1, The sum of all values of discrete type is 1
Random Sampling
Random sampling 

Strengthen the foundation of learning
Strengthen learning concept nouns

state: state
action: action
agent: agent 
policy: Strategy ( Probability density function )
The probability of each action , Use random strategies , More realistic , It is not easy to see the law 
reward: Reward
Rewards should be set according to the actual situation , Such as : Get a gold coin reward +1, The game rewards +10000, Mary eliminated the Award -10000, Nothing happened. The reward is 0, The purpose of reinforcement learning is to improve the rewards obtained 
state transition: State shift
State transitions are random , The state transition probability density function is known only by the environment , Players don't know
brief introduction

agent take action,environment Of state Change and return reward to agent,agent according to reward To study
- The source of randomness in reinforcement learning
action The randomness of
state The randomness of
- AI How to play games


Observe state s1,Agent utilize policy Function execution action a1,environment Generate a new state s2 And returned reward r1 to agent ,agent To use again policy Function execution action a2...... Loop the operation - Rewards and Returns

- Return
return: Return , That is, the cumulative rewards in the future
Ut from Rt By the end of the game Rn Cumulative income . At present reword It should be better than later reword Great power , such as : Today's 80 Yuan than tomorrow 100 Yuan is practical 
y: Discount report , Be situated between 0-1
- Randomness of reporting



t moment return Depending on t To n The moment reward,reward Depends on state and action, therefore return It also depends on state and action
- Value Function
action-value function—— Action value function 

about Ut for ,St and At It's observable ,St+1——Sn, and At+1——An It's a random variable 
St+1 Probability and St,At of ,At+1 Probability and St+1 of 
state-value function—— State value function 



- Ai control the agent

Π(a|s) Strategy learning function , stay state Optimal in case action,Q(s,a) Calculate the score of each action , Choose the best *
Evaluate reinforcement learning
OpenAI Gym



summary

边栏推荐
- 【数字IC验证快速入门】1、浅谈数字IC验证,了解专栏内容,明确学习目标
- Leetcode brush question: binary tree 14 (sum of left leaves)
- .Net分布式事务及落地解决方案
- 银河证券在网上开户安全吗?
- 死信队列入门(两个消费者,一个生产者)
- JS implementation prohibits web page zooming (ctrl+ mouse, +, - zooming effective pro test)
- Ffplay document [easy to understand]
- Notes on key vocabulary in the English original of the biography of jobs (12) [chapter ten & eleven]
- Y57. Chapter III kubernetes from entry to proficiency -- business image version upgrade and rollback (30)
- ffplay文档[通俗易懂]
猜你喜欢

Leetcode skimming: binary tree 17 (construct binary tree from middle order and post order traversal sequence)

Mysql频繁操作出现锁表问题
![[quick start of Digital IC Verification] 6. Quick start of questasim (taking the design and verification of full adder as an example)](/img/6d/110b87747f0a4be52da9fd49a05b82.png)
[quick start of Digital IC Verification] 6. Quick start of questasim (taking the design and verification of full adder as an example)

Leetcode: binary tree 15 (find the value in the lower left corner of the tree)

物联网智能家居基本方法实现之经典

如何形成规范的接口文档

【愚公系列】2022年7月 Go教学课程 004-Go代码注释
![[Yugong series] go teaching course in July 2022 004 go code Notes](/img/18/ffbab0a251dc2b78eb09ce281c2703.png)
[Yugong series] go teaching course in July 2022 004 go code Notes

leetcode刷题:二叉树14(左叶子之和)

leetcode刷题:二叉树12(二叉树的所有路径)
随机推荐
Mongodb/ document operation
mongodb文档间关系
【数字IC验证快速入门】8、数字IC中的典型电路及其对应的Verilog描述方法
Process file and directory names
【c语言】快速排序的三种实现以及优化细节
Is it safe for CICC fortune to open an account online?
MySql的root密码忘记该怎么找回
[quick start of Digital IC Verification] 6. Quick start of questasim (taking the design and verification of full adder as an example)
信息学奥赛一本通 1338:【例3-3】医院设置 | 洛谷 P1364 医院设置
sort和投影
[quick start of Digital IC Verification] 7. Basic knowledge of digital circuits necessary for verification positions (including common interview questions)
How to retrieve the root password of MySQL if you forget it
c语言oj得pe,ACM入门之OJ~
实操演示:产研团队如何高效构建需求工作流?
2023年深圳市绿色低碳产业扶持计划申报指南
Informatics Orsay all in one 1339: [example 3-4] find the post order traversal | Valley p1827 [usaco3.4] American Heritage
Leetcode brush questions: binary tree 18 (largest binary tree)
js方法传Long类型id值时会出现精确损失
DP: tree DP
ICTCLAS word Lucene 4.9 binding