当前位置:网站首页>1. Strengthen learning basic knowledge points
1. Strengthen learning basic knowledge points
2022-07-05 20:20:00 【C--G】
Knowledge supplement of probability theory
Random Variable
Coin tossing is a random event , The probability of facing up and facing up is 0.5, Usually use X Represents a random variable ,x Represents the observed value
Probability Density Function (PDF)
The probability density function means the probability that a random variable is attached to a certain value point
Gaussian distribution
Discrete probability distribution
If the probability density function is continuous , Then the sum of the function integral is 1, The sum of all values of discrete type is 1
Random Sampling
Random sampling
Strengthen the foundation of learning
Strengthen learning concept nouns
state: state
action: action
agent: agent
policy: Strategy ( Probability density function )
The probability of each action , Use random strategies , More realistic , It is not easy to see the law
reward: Reward
Rewards should be set according to the actual situation , Such as : Get a gold coin reward +1, The game rewards +10000, Mary eliminated the Award -10000, Nothing happened. The reward is 0, The purpose of reinforcement learning is to improve the rewards obtained
state transition: State shift
State transitions are random , The state transition probability density function is known only by the environment , Players don't know
brief introduction
agent take action,environment Of state Change and return reward to agent,agent according to reward To study
- The source of randomness in reinforcement learning
action The randomness of
state The randomness of - AI How to play games
Observe state s1,Agent utilize policy Function execution action a1,environment Generate a new state s2 And returned reward r1 to agent ,agent To use again policy Function execution action a2...... Loop the operation - Rewards and Returns
- Return
return: Return , That is, the cumulative rewards in the future
Ut from Rt By the end of the game Rn Cumulative income . At present reword It should be better than later reword Great power , such as : Today's 80 Yuan than tomorrow 100 Yuan is practical
y: Discount report , Be situated between 0-1
- Randomness of reporting
t moment return Depending on t To n The moment reward,reward Depends on state and action, therefore return It also depends on state and action
- Value Function
action-value function—— Action value function
about Ut for ,St and At It's observable ,St+1——Sn, and At+1——An It's a random variable
St+1 Probability and St,At of ,At+1 Probability and St+1 of
state-value function—— State value function
- Ai control the agent
Π(a|s) Strategy learning function , stay state Optimal in case action,Q(s,a) Calculate the score of each action , Choose the best *
Evaluate reinforcement learning
OpenAI Gym
summary
边栏推荐
- 走入并行的世界
- [quick start of Digital IC Verification] 6. Quick start of questasim (taking the design and verification of full adder as an example)
- 解决Thinkphp框架应用目录下数据库配置信息修改后依然按默认方式连接
- 常用的视图容器类组件
- [C language] three implementations of quick sorting and optimization details
- Cocos2d-x项目总结中的一些遇到的问题
- 无卷积骨干网络:金字塔Transformer,提升目标检测/分割等任务精度(附源代码)...
- sun. misc. Base64encoder error reporting solution [easy to understand]
- 2020 CCPC 威海 - A. Golden Spirit(思维),D. ABC Conjecture(大数分解 / 思维)
- A solution to PHP's inability to convert strings into JSON
猜你喜欢
.Net分布式事務及落地解决方案
[quick start of Digital IC Verification] 1. Talk about Digital IC Verification, understand the contents of the column, and clarify the learning objectives
Leetcode skimming: binary tree 12 (all paths of binary tree)
Go language | 01 wsl+vscode environment construction pit avoidance Guide
Hong Kong stocks will welcome the "best ten yuan store". Can famous creative products break through through the IPO?
信息学奥赛一本通 1339:【例3-4】求后序遍历 | 洛谷 P1827 [USACO3.4] 美国血统 American Heritage
【数字IC验证快速入门】1、浅谈数字IC验证,了解专栏内容,明确学习目标
无卷积骨干网络:金字塔Transformer,提升目标检测/分割等任务精度(附源代码)...
B站UP搭建世界首个纯红石神经网络、基于深度学习动作识别的色情检测、陈天奇《机器学编译MLC》课程进展、AI前沿论文 | ShowMeAI资讯日报 #07.05
Unity编辑器扩展 UI控件篇
随机推荐
IC科普文:ECO的那些事儿
A solution to PHP's inability to convert strings into JSON
【数字IC验证快速入门】6、Questasim 快速上手使用(以全加器设计与验证为例)
Convolution free backbone network: Pyramid transformer to improve the accuracy of target detection / segmentation and other tasks (with source code)
全国爱眼教育大会,2022第四届北京国际青少年眼健康产业展会
B站UP搭建世界首个纯红石神经网络、基于深度学习动作识别的色情检测、陈天奇《机器学编译MLC》课程进展、AI前沿论文 | ShowMeAI资讯日报 #07.05
Schema and model
Elk distributed log analysis system deployment (Huawei cloud)
CCPC 2021威海 - G. Shinyruo and KFC(组合数,小技巧)
. Net distributed transaction and landing solution
ROS2专题【01】:win10上安装ROS2
【愚公系列】2022年7月 Go教学课程 004-Go代码注释
Go language learning tutorial (XV)
2023年深圳市绿色低碳产业扶持计划申报指南
怎么挑选好的外盘平台,安全正规的?
解决Thinkphp框架应用目录下数据库配置信息修改后依然按默认方式连接
信息学奥赛一本通 1340:【例3-5】扩展二叉树
Is it safe for Galaxy Securities to open an account online?
基础篇——配置文件解析
Informatics Olympiad 1337: [example 3-2] word search tree | Luogu p5755 [noi2000] word search tree