当前位置：网站首页>Understanding of maximum likelihood estimation, gradient descent, linear regression and logistic regression

Understanding of maximum likelihood estimation, gradient descent, linear regression and logistic regression

2022-07-28 07:12:00 【The most beautiful wish must be the craziest】

Maximum likelihood

I estimate the conditional probability of maximum likelihood （ Posterior probability ） And the understanding of a priori probability ： Suppose an experiment , There are two possible outcomes ,A perhaps B

All in all 50 Experiments ,A There is 20 Time ,B There is 30 Time , So please A Probability p.

The problem is coming. , How to find a reasonable p Is it worth it

L Express A The probability of occurrence is p Under the circumstances , Conduct 50 Experiments , Probability of various situations .

It's easy to understand , If there is 20 Time A,30 Time B, be x1+x2+...+x50=20, Appears as 1, Do not appear as 0, So it's here 20 Time .

therefore ：

When L Maximum time , It shows that the situation of these samples is the most likely , In this case ,A The probability of is the most reasonable . If L Very small , It shows that the situation of these samples is very extreme , here A The probability of is very unreasonable .

When the number of experiments becomes N When ,A The number of occurrences becomes m Then the maximum likelihood estimation A The prior probability of is m/N

alike , When the category of the experiment changes to A、B、C... When waiting for multiple categories, you can A Let's set the probability of p, This class of other classes is （1-p）, It can also be deduced from the above figure A The prior probability of is m/N

Using maximum likelihood to estimate conditional probability is also the same method as the above figure , It's just N Is the total number of samples under a certain condition , and m Under this condition A Number of occurrences .

gradient descent

Here is a statement , The relationship between matrix and vector ：

The vector is just n That's ok 1 Column's special matrix

Let's look at the definition of gradient from a mathematical point of view . In calculus , Find the parameter of multivariate function ∂ Partial derivative , Write the partial derivatives of the parameters in the form of vectors , It's gradient . The so-called vector expression is written in the form of matrix . Like functions f(x,y), Respectively for x,y Find the partial derivative , The gradient vector that we get is (∂f/∂x, ∂f/∂y)T, abbreviation grad f(x,y) perhaps ▽f(x,y).

In a geometric sense , Gradient is where the change of function increases the fastest . say concretely , For the function f(x,y), At point (x0,y0), Along the direction of the gradient vector is (∂f/∂x0, ∂f/∂y0)T The direction is f(x,y) The fastest growing place . Or say , In the direction of the gradient vector , It's easier to find the maximum value of a function . On the other hand , In the opposite direction of the gradient vector , That is to say -(∂f/∂x0, ∂f/∂y0)T The direction of , The gradient decreases the fastest , That is, it is easier to find the minimum value of a function .( Just remember this sentence , Don't ask why , Add to your troubles , This sentence is not a few words , Anyway , The opposite direction of the gradient is the fastest falling direction .)

Solution of gradient descent

Linear regression

From the perspective of linear regression, the gradient decreases

hypothesis n Samples X1={x11,x12,x13,x1i:y1},X2={x21,x22,x23,x2i:y2}......Xn={xn1,xn2,xn3,xni:yn}

about n Linear regression model of dimensional model

Logical regression

The difference between logical regression and linear regression is , The dependent variable of linear regression y It is a continuous value, and it is also a prediction of continuous values （ Such as house price , Age 、 Temperature etc. ）. Logical regression is right 0-1 Type classification problem , Such as （ Judge men and women, etc ）, Logistic regression is often used in binary classification problems .

Suppose our sample is {Xn：yn} ,yn yes 0 perhaps 1,Xn yes i Is the eigenvector

X1={x11,x12,x13,x1i:y1},X2={x21,x22,x23,x2i:y2}......Xn={xn1,xn2,xn3,xni:yn}