当前位置：网站首页>Detailed explanation of linear regression in machine learning

Detailed explanation of linear regression in machine learning

2022-07-01 10:24:00 【HUIM_ Wang】

A detailed explanation of linear regression of machine learning

Linear regression algorithm ：
Insert picture description here
Focus on gradient descent algorithm .
How to evaluate the model ： Loss function (lost function)
The simplest and most common loss function ： Minimum mean square error （mse）
The formula is as follows ：
Forecast house prices

Forecast house price data , Hypothetical model y=1, The final value 60132 No practical significance , But in contrast , The smaller the value, the better . The best minimum mean square deviation is as close to 0 Of , But according to the data sample , It can't be equal to 0

Suppose the house price is predicted , The eigenvalue is the area , The target value is house price , Need to fit a line , Calculate the weight m and b
Insert picture description here
Step one ： hypothesis m=0, namely y=b, be b Is the only adjustable parameter , Using the minimum mean square deviation formula , Calculate the minimum mean square deviation , In this process, an optimal parameter is fitted b

I know from the ,“ The optimal ” Of b The value should be mse=612 Corresponding 241 near , In this case, it is more in line with the price of house prices .（ This process needs to find the smallest one by one mse Corresponding b, More trouble , And given in the figure b It's from 1 Started looking for , Because of the uncertainty b Value , once b A negative value of is troublesome ）
In order to find the most suitable one as soon as possible and accurately b, A new concept needs to be introduced ： Learning rate （learning rate）
Let's look at a set of pictures first ：
Insert picture description here

Need to find the smallest mse Corresponding b spot

Insert picture description here

Insert picture description here
No matter b Start from what value , It is necessary for the computer to dynamically find the minimum according to the trend of the line mse value , I.e. derivative ,

According to the picture above , The minimum derivative obtained is -8, Corresponding to the minimum mean square deviation , here b=241
For all that , At this time b It is still the lowest value we can observe with the naked eye , also b The change and value of are added manually f, Computers can't understand , We need to adjust according to some semaphores b The change of .
If the curve is steep , For example, figure b, The value of the slope will be very negative , The next prediction point is on the right side of the curve, not on the left ; If it's a picture a situation , The slope will be very large , Should let b Offset to the left , That is, if the slope is negative , Then the next guess point should move to the right , If it's a positive value , It should be moved to the left , Until the curve flattens out , The derivative is close to 0.

We mentioned the learning rate above （learning rate）, It can be based on mse Yes b The derivative of , Find the most suitable one as soon as possible b, If the derivative is negative, the greater , Then it should change the more , If the derivative is smaller , The more close to 0, Then it should change the smaller .
The learning rate is a value , for example 0.0001, The learning rate is used in this way ： this b= the previous b - The last derivative * Learning rate , So the cycle goes on , Can quickly find the closest 0 The derivative of is corresponding to b .

If the learning rate is smaller , for example 0.000001, that b The slower the value of changes , If the learning rate is higher , for example 0.1、0.8,b The faster the value changes . The slower the change, the more iterations are required , Large amount of computation , But in the end b The more accurate the value ; The faster the change, the less the calculation , however b Value may not be the best one .

 Here's the picture ： When the learning rate is 0.00001、0.001,0.01、0.1 when ：

Insert picture description here

Insert picture description here

When learning rate by 0.2 when ,b Slowly approaching 245, The derivative slowly approaches 0

learning rate The value of cannot be too large , otherwise b It will only be farther and farther away from the right point

To simplify the model , The above is y=mx+b,（m=0） Under the circumstances , Find the adjustable parameter, that is, the weight b Value .（ This is an extreme situation , Under normal circumstances m≠0）

So in general, we should be right m and b To derive separately ：
Insert picture description here