当前位置：网站首页>The gradient descent method and Newton method are used to calculate the open radical

The gradient descent method and Newton method are used to calculate the open radical

2022-07-27 02:04:00 【Adenialzz】

The gradient descent method and Newton method calculate the open radical

This article will introduce how not to switch , You can only use addition, subtraction, multiplication and division to realize the root sign x The solution of . It mainly introduces gradient descent and Newton method , And give C++ Realization .

Gradient descent method

Ideas / step

Transformation problem , take $\sqrt{x}$ The solution of is transformed into minimizing the objective function ： $L (t)$ , $L(t)=(t^2-x)$ , When $L$ Tend to be 0 when , $t$ It's what we want ;
Iterative search makes $L$ Smaller $t$ ,
Finally get small enough $L$ At the time of the $t$ , Even if have to $L\rightarrow 0$ , Get the results $t$
solve $L$ The minimum of , The derivative is 0 The point of

How to iterate

OK, The question now is how to iterate , So as to get as small as possible $L$ , So , We want to make every iteration $t$ The change of , $L$ All towards smaller Direction Change a suitable step .

Insert picture description here

Determine how to iterate , It is nothing more than to determine each iteration Direction and step .

The most natural idea , We make $t$ Move a small step in both directions of the government , Then compare , Which one $L$ Smaller , Just move in which direction . namely ：

if $L(t+\Delta t)<L(t)$ , be $t_1=t+\Delta t$ ;
if $L(t-\Delta t)<L(t)$ , be $t_1=t-\Delta t$ ;

Notice the $\Delta t$ It should be an infinite decimal greater than zero , namely $0^+$ .

Insert picture description here

Next, let's make a little change to the above formula ：

if $L(t+\Delta t)-L(t)<0$ , be $t_1=t+\Delta t$ ;
if $L(t)-L(t-\Delta t)>0$ , be $t_1=t-\Delta t$ ;

Write these two formulas together ：
$t_1=t-\frac{L(t+\Delta t)-L(t)}{|L(t+\Delta t)-L(t)|}\cdot \Delta t$
there $\frac{L(t+\Delta t)-L(t)}{|L(t+\Delta t)-L(t)|}$ Used to indicate the sign . Make a little more deformation ：
$\begin{align} t_1&=t-\frac{L(t+\Delta t)-L(t)}{|L(t+\Delta t)-L(t)|}\cdot \Delta t\\ &=t-\frac{\frac{L(t+\Delta t)-L(t)}{\Delta t}}{|\frac{L(t+\Delta t)-L(t)}{\Delta t}|}\cdot \Delta t\\ &=t-\frac{L(t+\Delta t)}{\Delta t}\frac{\Delta t}{|\frac{L(t+\Delta t)-L(t)}{\Delta t}|}\\ &=t-\alpha L'(t), \ \ \ \alpha=\frac{\Delta t}{|\frac{L(t+\Delta t)-L(t)}{\Delta t}|}\rightarrow 0^+,\ \ \ L'(t)=\frac{L(t+\Delta t)-L(t)}{\Delta t} \end{align}$

When a Take infinite hours , Although it must be guaranteed to decline , But the efficiency is too slow
Many functions in daily design , Relatively large steps can be allowed , such as $\alpha = 0.01$ . The reason is that , Ruobu grew up , Skip to the right place , bring $L (t 1) > L (t 0)$ . Another moment , It is still possible to jump back and make $L (t 2) < L (t 1)$
A large step size does not guarantee convergence , But most of the time, it can work well , Therefore , step $\alpha$ , We call it learning rate , Usually a relatively small number is given , But not too small .
All in all , The learning rate generally needs to be manually debugged in different model tasks .

Code

float sqrt_grad_decent(float x) {
    
    float t = x / 2;
    float L = (t * t - x) * (t * t - x);
    float alpha = 0.001;
    while ( L > 1e-5 ) {
    
        float delta = 2 * (t * t - x) * 2 * t;
        t = t - alpha * delta;
        L = (t * t - x) * (t * t - x);
        printf("t=%f\n", t);
    }
    return t;
}

summary

Gradient descent method is by observing local , Algorithm that determines how to adjust . If the function has multiple extreme values , It may fall into local extremum , At this time, the choice of initial point directly affects the convergence result
A large step size may cross the local extreme value to some extent , But it may also cause vibration and lead to non convergence
The choice of step size , You need to find the appropriate value according to the characteristics of the function , If the derivative is particularly large , Then the step size is smaller , Derivative hours , The step length can be large . Otherwise, it is easy to cause convergence problems
There is a kind of Algorithm , You can search a suitable step in a certain range , Make each iteration more stable

Newton method 1

Gradient descent method is often used to solve Function minimum The situation of , Newton's method is often used to solve Function zero The situation of , namely $L = 0$ Root of time equation .

Ideas / step

Transformation problem , Will solve $\sqrt{x}$ Convert to solve $L(t)=t^2-x=0$ Root of time , That is, the zero point of the function
Iterative search $t$

How to iterate

Use a curve in $t_0$ Tangent and $x$ The intersection of the axes is taken as $t_1$ , To approach the zero of the function . chart / Newton method

Insert picture description here

Tangent slope , It can also be expressed by derivative .

Consider two coordinate systems ： Original coordinate system $o 1$ , New coordinate system $o 2$ , among $o 2$ With $o 1$ Medium $x_1,f(x_1))$ Origin . It's in $o 2$ In coordinate system , The red tangent in the figure below can be represented as ：
$f_{o2}(x)=f'(x_1)x$
Then the tangent line and $x$ Axis intersection point ：
$f_{o2}(x_2)=f'(x_1)(x_2-x_1)=-f(x_1)$
Then there are ：
$x_2-x_1=-\frac{f(x_1)}{f'(x_1)}\\ x_2=x_1-\frac{f(x_1)}{f'(x_1)}$

Insert picture description here

Code

We already know the iterative method through the last section ：
$t_1=t-\frac{L(t)}{L'(t)}$
Code ：

float sqrt_newton1(float x) {
    
    float t = x / 2;
    float L = t * t - x;
    while ( abs(L) > 1e-5 ) {
    
        float dL = 2 * t;
        t = t - L / dL;
        L = t * t - x;
    }
    return t;
}

Newton method 2

Ideas

Since Newton's method is to find the zero point of a function , Can we find the zero point of the derivative of the function ？ In this way, we can get the extreme value of the function .

And the objective function of gradient descent method $L(t)=(t^2-x)$ It's the same , And the difference is , Iterations are different $t_1=t-\frac{f'(t)}{f''(t)}$ , And the step length （ Learning rate ） by 1.

Code

float sqrt_newton2(float x) {
    
    float t = x / 2;
    float L = (t * t - x) * (t * t - x);
    while ( L > 1e-5 ) {
    
        float dL = 2 * (t * t - x) * 2 * t;
        float d2L = 12 * t * t - 4 * x;
        t = t - dL / d2L;
        L = (t * t - x) * (t * t - x);
    }
    return t;
}