当前位置：网站首页>1. Strengthen learning basic knowledge points

1. Strengthen learning basic knowledge points

2022-07-05 20:20:00 【C--G】

Knowledge supplement of probability theory

Random Variable

Coin tossing is a random event , The probability of facing up and facing up is 0.5, Usually use X Represents a random variable ,x Represents the observed value
Insert picture description here

Probability Density Function (PDF)

The probability density function means the probability that a random variable is attached to a certain value point

Gaussian distribution
Insert picture description here
Discrete probability distribution

If the probability density function is continuous , Then the sum of the function integral is 1, The sum of all values of discrete type is 1

Random Sampling

Random sampling
Insert picture description here

Insert picture description here

Strengthen the foundation of learning

Strengthen learning concept nouns

Insert picture description here
state： state
action： action
agent： agent

policy： Strategy （ Probability density function ）

The probability of each action , Use random strategies , More realistic , It is not easy to see the law

reward： Reward
Rewards should be set according to the actual situation , Such as ： Get a gold coin reward +1, The game rewards +10000, Mary eliminated the Award -10000, Nothing happened. The reward is 0, The purpose of reinforcement learning is to improve the rewards obtained Insert picture description here
state transition： State shift
State transitions are random , The state transition probability density function is known only by the environment , Players don't know

brief introduction

Insert picture description here
agent take action,environment Of state Change and return reward to agent,agent according to reward To study

The source of randomness in reinforcement learning
action The randomness of

state The randomness of
AI How to play games

Observe state s1,Agent utilize policy Function execution action a1,environment Generate a new state s2 And returned reward r1 to agent ,agent To use again policy Function execution action a2...... Loop the operation
Rewards and Returns
Return

return： Return , That is, the cumulative rewards in the future
Ut from Rt By the end of the game Rn Cumulative income . At present reword It should be better than later reword Great power , such as ： Today's 80 Yuan than tomorrow 100 Yuan is practical
Insert picture description here
y： Discount report , Be situated between 0-1

Randomness of reporting

Insert picture description here

t moment return Depending on t To n The moment reward,reward Depends on state and action, therefore return It also depends on state and action

Value Function

action-value function—— Action value function
Insert picture description here

about Ut for ,St and At It's observable ,St+1——Sn, and At+1——An It's a random variable

St+1 Probability and St,At of ,At+1 Probability and St+1 of

state-value function—— State value function

Insert picture description here

Ai control the agent

Π（a|s） Strategy learning function , stay state Optimal in case action,Q（s,a） Calculate the score of each action , Choose the best *