当前位置:网站首页>1. Strengthen learning basic knowledge points
1. Strengthen learning basic knowledge points
2022-07-05 20:20:00 【C--G】
Knowledge supplement of probability theory
Random Variable
Coin tossing is a random event , The probability of facing up and facing up is 0.5, Usually use X Represents a random variable ,x Represents the observed value
Probability Density Function (PDF)
The probability density function means the probability that a random variable is attached to a certain value point
Gaussian distribution
Discrete probability distribution
If the probability density function is continuous , Then the sum of the function integral is 1, The sum of all values of discrete type is 1
Random Sampling
Random sampling
Strengthen the foundation of learning
Strengthen learning concept nouns
state: state
action: action
agent: agent
policy: Strategy ( Probability density function )
The probability of each action , Use random strategies , More realistic , It is not easy to see the law
reward: Reward
Rewards should be set according to the actual situation , Such as : Get a gold coin reward +1, The game rewards +10000, Mary eliminated the Award -10000, Nothing happened. The reward is 0, The purpose of reinforcement learning is to improve the rewards obtained
state transition: State shift
State transitions are random , The state transition probability density function is known only by the environment , Players don't know
brief introduction
agent take action,environment Of state Change and return reward to agent,agent according to reward To study
- The source of randomness in reinforcement learning
action The randomness of
state The randomness of - AI How to play games
Observe state s1,Agent utilize policy Function execution action a1,environment Generate a new state s2 And returned reward r1 to agent ,agent To use again policy Function execution action a2...... Loop the operation - Rewards and Returns
- Return
return: Return , That is, the cumulative rewards in the future
Ut from Rt By the end of the game Rn Cumulative income . At present reword It should be better than later reword Great power , such as : Today's 80 Yuan than tomorrow 100 Yuan is practical
y: Discount report , Be situated between 0-1
- Randomness of reporting
t moment return Depending on t To n The moment reward,reward Depends on state and action, therefore return It also depends on state and action
- Value Function
action-value function—— Action value function
about Ut for ,St and At It's observable ,St+1——Sn, and At+1——An It's a random variable
St+1 Probability and St,At of ,At+1 Probability and St+1 of
state-value function—— State value function
- Ai control the agent
Π(a|s) Strategy learning function , stay state Optimal in case action,Q(s,a) Calculate the score of each action , Choose the best *
Evaluate reinforcement learning
OpenAI Gym
summary
边栏推荐
- y57.第三章 Kubernetes从入门到精通 -- 业务镜像版本升级及回滚(三十)
- How to choose a good external disk platform, safe and formal?
- sun.misc.BASE64Encoder报错解决方法[通俗易懂]
- July 4, 2022 - July 10, 2022 (UE4 video tutorial MySQL)
- sort和投影
- 股票开户哪里好?网上客户经理开户安全吗
- 19 Mongoose模块化
- 14、Transformer--VIT TNT BETR
- Schema和Model
- 信息学奥赛一本通 1339:【例3-4】求后序遍历 | 洛谷 P1827 [USACO3.4] 美国血统 American Heritage
猜你喜欢
【愚公系列】2022年7月 Go教学课程 004-Go代码注释
leetcode刷题:二叉树11(平衡二叉树)
Wechat applet regular expression extraction link
Database logic processing function
如何形成规范的接口文档
无卷积骨干网络:金字塔Transformer,提升目标检测/分割等任务精度(附源代码)...
Leetcode brush question: binary tree 14 (sum of left leaves)
.Net分布式事务及落地解决方案
[quick start of Digital IC Verification] 9. Finite state machine (FSM) necessary for Verilog RTL design
[quick start of Digital IC Verification] 3. Introduction to the whole process of Digital IC Design
随机推荐
[C language] three implementations of quick sorting and optimization details
Bzoj 3747 poi2015 kinoman segment tree
sun. misc. Base64encoder error reporting solution [easy to understand]
.Net分布式事務及落地解决方案
Leetcode(695)——岛屿的最大面积
leetcode刷题:二叉树10(完全二叉树的节点个数)
js实现禁止网页缩放(Ctrl+鼠标、+、-缩放有效亲测)
【数字IC验证快速入门】2、通过一个SoC项目实例,了解SoC的架构,初探数字系统设计流程
mongodb/文档操作
BZOJ 3747 POI2015 Kinoman 段树
Document method
Relationship between mongodb documents
y57.第三章 Kubernetes从入门到精通 -- 业务镜像版本升级及回滚(三十)
Notes on key vocabulary in the English original of the biography of jobs (12) [chapter ten & eleven]
Schema和Model
Leetcode brush question: binary tree 13 (the same tree)
leetcode刷题:二叉树17(从中序与后序遍历序列构造二叉树)
[quick start of Digital IC Verification] 9. Finite state machine (FSM) necessary for Verilog RTL design
Zero cloud new UI design
leetcode刷题:二叉树18(最大二叉树)