当前位置:网站首页>1、强化学习基础知识点
1、强化学习基础知识点
2022-07-05 20:18:00 【C--G】
概率论知识补充
Random Variable
抛硬币是随机事件,正面朝上与反面朝上概率都是0.5,通常使用X表示随机变量,x表示观测值
Probability Density Function (PDF)
概率密度函数意味着某个随机变量在某个确定的取值点附件的可能性
高斯分布
离散概率分布
概率密度函数如果为连续型,则函数积分和为1,离散型所有取值和为1
Random Sampling
随机抽样
强化学习基础
强化学习概念名词
state:状态
action:动作
agent:智能体
policy:策略(概率密度函数)
各个动作的概率,使用随机的策略,更切合现实,不易看出规律
reward:奖励
要根据实际情况设置奖励,如:吃到金币奖励+1,游戏通过奖励+10000,玛丽淘汰奖励-10000,什么也没发生奖励是0,强化学习的目的是提高获得的奖励
state transition:状态转移
状态转移是随机的,状态转移概率密度函数只有环境知道,玩家不知道
简介
agent采取action,environment的state改变同时返回reward给agent,agent根据reward进行学习
- 强化学习中随机性的来源
action的随机性
state的随机性 - AI如何玩游戏
观察state s1,Agent利用policy函数执行action a1,environment生成新的state s2并返回的reward r1给agent ,agent再次利用policy函数执行action a2。。。。。。循环该操作 - Rewards and Returns
- 回报
return:回报,也就是未来的累积奖励
Ut由Rt到游戏结束Rn累加所得。当前reword应该比后期reword权重大,比如:今天的80元比明天100元来得实际
y:折扣汇报,介于0-1
- 汇报的随机性
t时刻return取决于t到n时刻的reward,reward取决与state和action,所以return也取决与state和action
- Value Function
action-value function——动作价值函数
对于Ut而言,St和At是可以观察的,St+1——Sn,和At+1——An是随机变量
St+1概率与St,At有关,At+1概率与St+1有关
state-value function——状态价值函数
- Ai control the agent
Π(a|s)策略学习函数,在state情况下最优action,Q(s,a)计算各个动作的得分,选择最优*
评估强化学习
OpenAI Gym
总结
边栏推荐
- ICTCLAS用的字Lucene4.9捆绑
- Securerandom things | true and false random numbers
- Based on vs2017 and cmake GUI configuration, zxing and opencv are used in win10 x64 environment, and simple detection of data matrix code is realized
- Autumn byte interviewer asked you any questions? In fact, you have stepped on thunder
- JS implementation prohibits web page zooming (ctrl+ mouse, +, - zooming effective pro test)
- 信息学奥赛一本通 1340:【例3-5】扩展二叉树
- 微信小程序正则表达式提取链接
- 中金财富在网上开户安全吗?
- Station B up builds the world's first pure red stone neural network, pornographic detection based on deep learning action recognition, Chen Tianqi's course progress of machine science compilation MLC,
- After 95, Alibaba P7 published the payroll: it's really fragrant to make up this
猜你喜欢
.Net分布式事務及落地解决方案
Leetcode skimming: binary tree 10 (number of nodes of a complete binary tree)
leetcode刷题:二叉树16(路径总和)
【数字IC验证快速入门】3、数字IC设计全流程介绍
618 "low key" curtain call, how can baiqiushangmei join hands with the brand to cross the "uncertain era"?
Wechat applet regular expression extraction link
Practical demonstration: how can the production research team efficiently build the requirements workflow?
Leetcode brush question: binary tree 13 (the same tree)
PyTorch 1.12发布,正式支持苹果M1芯片GPU加速,修复众多Bug
JS implementation prohibits web page zooming (ctrl+ mouse, +, - zooming effective pro test)
随机推荐
[quick start of Digital IC Verification] 6. Quick start of questasim (taking the design and verification of full adder as an example)
【数字IC验证快速入门】1、浅谈数字IC验证,了解专栏内容,明确学习目标
2022北京眼睛健康用品展,护眼产品展,中国眼博会11月举办
基础篇——配置文件解析
Summer Challenge harmonyos - realize message notification function
Is it safe for Galaxy Securities to open an account online?
[quick start of Digital IC Verification] 3. Introduction to the whole process of Digital IC Design
Securerandom things | true and false random numbers
【数字IC验证快速入门】2、通过一个SoC项目实例,了解SoC的架构,初探数字系统设计流程
Parler de threadlocal insecurerandom
2020 CCPC 威海 - A. Golden Spirit(思维),D. ABC Conjecture(大数分解 / 思维)
无卷积骨干网络:金字塔Transformer,提升目标检测/分割等任务精度(附源代码)...
Bzoj 3747 poi2015 kinoman segment tree
银河证券在网上开户安全吗?
处理文件和目录名
Unity editor extended UI control
mongodb文档间关系
秋招字节面试官问你还有什么问题?其实你已经踩雷了
Leetcode skimming: binary tree 17 (construct binary tree from middle order and post order traversal sequence)
Ffplay document [easy to understand]