当前位置:网站首页>CNN convolution neural network learning process (weight update)
CNN convolution neural network learning process (weight update)
2022-07-28 20:22:00 【LifeBackwards】
Convolutional neural network adopts BP Algorithm learning network parameters ,BP The algorithm is based on the gradient descent principle to update the network parameters . In a convolutional neural network , The parameters to be optimized include convolution kernel parameters k、 Lower sampling layer weight β、 Full connection layer network weights w And offset of each layer b. We take the mean square error between the expectation and output of convolutional neural network as the cost function , The aim is to minimize the cost function , So that the actual neural network output can accurately predict the input , The cost function is as follows :
among ,N For the number of training samples ,
It's No n Real category labels of training samples ,
It's No n The prediction category labels obtained by convolution neural network learning of training samples .
Convolution layer gradient calculation
Generally speaking, every convolution l There will be a lower sampling layer behind l+1, According to the back-propagation method , To get convolution layer l The gradient of each neural node corresponding to the weight , You need to find l The residual of each neural node of the layer δ, That is, the partial derivative of the cost function for each neural node . In order to find the residual , You need to sum up the residuals of the nodes at the next level to get
, Then multiply by the weight corresponding to these connections W, Then multiply by l Input of this neural node in the layer u The activation function of f Partial derivative of , It can be obtained. l Residual of each neural node in layer
. Due to the existence of down sampling , The residual of the neural node of the sampling layer corresponds to the region of the sampling window size in the output characteristic graph of the previous layer , therefore l Each neural node of each feature graph in the layer is only associated with l+1 The layer is connected with a neural node in the corresponding characteristic graph , Calculate the residual of each pixel in the feature map to get the residual map . For calculation l Layer residuals , The residuals corresponding to the upper sampling layer and the lower sampling layer are required , And then l The partial derivative of the excitation function of the characteristic graph of the layer and the residual graph obtained by up sampling are multiplied item by item . Finally, multiply the above results by the weight β, Thus, the convolution layer is obtained l Residual of . The residual of the characteristic graph in the convolution layer is calculated by the following formula :
among , Symbol ο Represents the multiplication of each element ,up(·) Indicates an upsampling operation . If the sampling factor of down sampling is n, that up(·) The operation is to copy each element horizontally and vertically n Time . function up(·) It can be used Kronecker The product of
To achieve . According to this residual figure , The gradient of the offset and convolution kernel corresponding to the characteristic graph can be calculated :

among ,(u, v) Represents the coordinates of pixels in the feature map ,
To calculate
when , And
Multiplied item by item
Elements .
Gradient calculation of lower sampling layer
The parameters involved in the down sampling forward process have a multiplicative factor corresponding to each characteristic graph β And an offset b. In order to find the lower sampling layer l Gradient of , You need to find the corresponding area between the residual graph of the current layer and the residual graph of the next layer , Then the residual is propagated back . Besides , It also needs to be multiplied by the weight between the input characteristic map and the output characteristic map , This weight is the parameter of convolution kernel . The formula is as follows :
![]()
Multiplicative factor β And offset b The gradient calculation formula of is as follows :

Gradient calculation of full connection layer
The calculation of the full connection layer is similar to that of the lower sampling layer , The residual calculation formula is as follows :
![]()
The partial derivative of the cost function to the bias is as follows :
![]()
The gradient calculation formula of the weight of the whole connection layer is :

边栏推荐
- Richpedia: A Large-Scale, Comprehensive Multi-Modal Knowledge Graph
- Raspberry pie 3b ffmpeg RTMP streaming
- A chip company fell in round B
- [C language] string reverse order implementation (recursion and iteration)
- JVM(二十四) -- 性能监控与调优(五) -- 分析GC日志
- Digital filter design matlab
- C+ + core programming
- Solve the kangaroo crossing problem (DP)
- Simple example of C language 1
- NEIL: Extracting Visual Knowledge from Web Data
猜你喜欢
随机推荐
Implementation of strcat in C language
Basic knowledge of communication network 01
C language - control statement
Linxu 【权限,粘滞位】
熊市下PLATO如何通过Elephant Swap,获得溢价收益?
Wust-ctf2021-re school match WP
Tencent cloud deployment lamp_ Experience of building a station
zfoo增加类似于mydog的路由
CM4 development cross compilation tool chain production
Richpedia: A Large-Scale, Comprehensive Multi-Modal Knowledge Graph
私有化部署的即时通讯平台,为企业移动业务安全保驾护航
Common instructions of vim software
最大交换[贪心思想&单调栈实现]
Product manager interview | innovation and background of the fifth generation verification code
[experiment sharing] CCIE BGP reflector experiment
Windows系统下Mysql数据库定时备份
C language data 3 (2)
Sequential linear table - practice in class
产品经理访谈 | 第五代验证码的创新与背景
Raspberry pie 3b ffmpeg RTMP streaming
![[C language] Hanoi Tower problem [recursion]](/img/d8/ff66928c2bc2ad906e38a360a8cf94.png)








