当前位置:网站首页>[foundation of deep learning] learning of neural network (3)
[foundation of deep learning] learning of neural network (3)
2022-06-11 17:36:00 【Programmer Xiao Li】
Through the previous study , We know how to calculate the partial derivative under multidimensional variables . The process of neural network learning is the process of constantly seeking the optimal parameters , The calculation of partial derivatives and gradients is the fastest way to pursue the optimal parameters .
What is gradient
gradient , Is a set of vectors composed of partial derivatives of multidimensional independent variables . For example, for the following function :

Yes x0 The partial derivative calculation of :

Yes x1 The partial derivative calculation of :

So the gradient is a vector of two partial derivatives :

According to the previous study , The partial derivative can be calculated by means of central difference , When calculating a one-dimensional independent variable , Arguments to other dimensions are treated as constants :

def numerical_gradient(f, x):
# Minor variation
h = 1e-4 # 0.0001
# Generate an array of equal dimensions
grad = np.zeros_like(x) # Generate and x An array of the same shape
for idx in range(x.size):
# The value of the argument of the current dimension
tmp_val = x[idx]
# f(x+h) The calculation of
x[idx] = tmp_val + h
fxh1 = f(x)
# f(x-h) The calculation of
x[idx] = tmp_val - h
fxh2 = f(x)
# Differential calculation
grad[idx] = (fxh1 - fxh2) / (2*h)
x[idx] = tmp_val # Reduction value
return gradThen calculate the partial derivative of the function at some position 、 Gradients are convenient :
def function_2(x):
return x[0]**2 + x[1]**2>>> numerical_gradient(function_2, np.array([3.0, 4.0]))
array([ 6., 8.])A
>>> numerical_gradient(function_2, np.array([0.0, 2.0]))
array([ 0., 4.])
>>> numerical_gradient(function_2, np.array([3.0, 0.0]))
array([ 6., 0.])We see functions in different places , Their partial derivatives are different , So what do these partial derivatives and gradients mean ?

If the negative gradient is used as a vector to draw the image , We get the image above . We found that , The arrows point to the position of the minimum value of the function , Just like our 3D images , They all point to the bottom of the groove :

actually , Direction of negative gradient , Is that the function is in that position The direction of greatest reduction .
Gradient descent method
Since the negative gradient points to the function at that position The direction of greatest reduction , We can take advantage of this feature , Slowly adjust the parameters , Make it optimal ( Reach the lowest point of the cost function ).

in other words , Every time we calculate the gradient at that location , Then move a small unit here to the position where the gradient descends , In this way, we approach the goal step by step .
there η yes Learning rate , Feel the size of each step we take . Learning rate should not be too high , Otherwise, it is easy to jump back and forth , Can't find the best value ; The learning rate should not be too small , Otherwise, the learning speed is too slow , Low efficiency .
( Generally, the learning rate is dynamically adjusted , The previous steps use a large learning rate , Fast , The next steps use a smaller learning rate , It is slow but can avoid exceeding the optimal value .)
First, let's update the position in the process of gradient descent :
def gradient_descent(f, init_x, lr=0.01, step_num=100):
x = init_x
# Start iteration update
for i in range(step_num):
# Calculate the gradient
grad = numerical_gradient(f, x)
# Mobile location
x -= lr * grad
return xLet's try to calculate the minimum :
>>> def function_2(x):
... return x[0]**2 + x[1]**2 ...
>>> init_x = np.array([-3.0, 4.0])
>>> gradient_descent(function_2, init_x=init_x, lr=0.1, step_num=100)
array([ -6.11110793e-10, 8.14814391e-10])We found that the results were very close (0,0) This shows that the lowest point of the function is really found

But in fact , The independent variable really moves to the lowest point step by step along the direction of the negative gradient .
Gradient in neural network
The gradient in the learning process is the partial derivative vector of the cost function and the weight matrix .

Weight update

边栏推荐
猜你喜欢

CLP information -5 keywords to see the development trend of the financial industry in 2022

How to simplify a lot of if... Elif... Else code?

测试基础之:黑盒测试

Vscode automatic eslint formatting when saving code

MFSR:一种新的推荐系统多级模糊相似度量

Xie Yang, CEO of authing, was selected into Forbes' 30 under 30 Asia list in 2021

05_特征工程—降维

Hands on deep learning - multiple input and output channels in the convolution layer

Custom or subscription? What is the future development trend of China's SaaS industry?

括号生成---2022/02/25
随机推荐
Difference between require and ES6 import
threejs利用indexeddb缓存加载glb模型
Authing 背后的计算哲学
How does Sister Feng change to ice?
av_read_frame返回值为-5 Input/output error
拜登下令强制推行零信任架构
Set object mapping + scene 6-face mapping + create space in threejs
Service learning notes 03 front desk service practice
ffmpeg硬编解码 Inter QSV
Mathematical basis of information security Chapter 1 - Division
Several ways to recover tidb data from accidental deletion
Vscode automatic eslint formatting when saving code
Custom or subscription? What is the future development trend of China's SaaS industry?
Leetcode力扣刷题
6-1 从文件读取字符串(*)
Mathematical basis of information security Chapter 2 - congruence
ffmpeg奇偶场帧Interlace progressive命令和代码处理
6-3 读文章(*)
6-8 创建、遍历链表
ForEach遍历集合、 集合容器