当前位置:网站首页>Arrangement of statistical learning knowledge points -- maximum likelihood estimation (MLE) and maximum a posteriori probability (map)

Arrangement of statistical learning knowledge points -- maximum likelihood estimation (MLE) and maximum a posteriori probability (map)

2022-06-12 07:30:00 Turned_ MZ

Collation of statistical learning knowledge points —— Maximum likelihood estimation (MLE) And the maximum a posteriori probability (MAP)

Likelihood function and probability function

likelihood (likelihood) This word is actually related to probability (probability) It means something similar , For the following equation :

P ( x ∣ θ ) 

There are two inputs :x Represents a specific data ;θ Represent the parameters of the model .

If θ It's known and certain ,x It's a variable. , This function is called probability function (probability function), It describes for different sample points x, What is the probability of its occurrence .

If x It's known and certain ,θ It's a variable. , This function is called the likelihood function (likelihood function), It describes for different model parameters , appear x What's the probability of this sample point .
 

Maximum likelihood estimation

Maximum likelihood estimation , In popular sense , Is to use the known sample result information , Backtracking is most likely ( Maximum probability ) The model parameter values that lead to these sample results .

characteristic : Simple and applicable ; It usually converges well when the number of training samples increases ; Based on the assumption that all samples are independent and identically distributed ;

Probability for samples :

The goal is to ask for one θ, Make the above probability maximum , To simplify the calculation , Take the logarithm of the equation :

To maximize L, Yes L Find the derivative and let the derivative be 0 You can solve .

Maximum posterior probability

Maximum posterior estimate (MAP-Maxaposterior): seek p(D|θ )*p(θ) The parameter vector that takes the maximum value θ, Maximum likelihood estimation can be understood as a priori probability p(θ) For uniform distribution MAP estimator .(MAP shortcoming : If we do some arbitrary nonlinear transformation on the parameter space , Such as rotation transformation , So the probability density p(θ) It's going to change , The estimated result is no longer valid .) A point estimate of an unobservable quantity obtained from empirical data . Similar to MLE , But the biggest difference is , The maximum a posteriori estimate is incorporated into the prior distribution of the quantity to be estimated , It can be regarded as regularized maximum likelihood estimation

By Bayesian formula :

 

P(X) Is something that has happened , Is a fixed value , So maximize the posterior probability P(θ|X) It's about maximizing P(X|θ)P(θ)

As for the solution of the above objective function , It is the same as the maximum likelihood estimation , Take the derivative of the objective function and let the derivative be 0 To solve .

MAP And MLP The difference between

MAP And MLE The biggest difference is MAP The probability distribution of the model parameters is added to the , Or say ,MLE It is considered that the probability of the model parameters themselves is uniform , That is, the probability is a fixed value .MAP It allows us to add prior knowledge to the estimation model , This is useful when there are very few samples , Because when there are few samples, our observation results are likely to deviate , At this point, a priori knowledge will put the estimated result “ PULL ” To a priori , The actual prediction results will form a peak on both sides of the a priori results . By adjusting the parameters of a priori distribution , such as beta The distribution of the , We can also adjust the estimated results “ PULL ” To a priori magnitude , The bigger it is , The sharper the peak . Such a parameter , What we call the predictive model “ Hyperparameters ”.
 

 

 

原网站

版权声明
本文为[Turned_ MZ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203010556461117.html