当前位置：网站首页>CNN convolution neural network learning process (weight update)

CNN convolution neural network learning process (weight update)

2022-07-28 20:22:00 【LifeBackwards】

Convolutional neural network adopts BP Algorithm learning network parameters ,BP The algorithm is based on the gradient descent principle to update the network parameters . In a convolutional neural network , The parameters to be optimized include convolution kernel parameters k、 Lower sampling layer weight β、 Full connection layer network weights w And offset of each layer b. We take the mean square error between the expectation and output of convolutional neural network as the cost function , The aim is to minimize the cost function , So that the actual neural network output can accurately predict the input , The cost function is as follows ：

among ,N For the number of training samples , It's No n Real category labels of training samples , It's No n The prediction category labels obtained by convolution neural network learning of training samples .

Convolution layer gradient calculation

Generally speaking, every convolution l There will be a lower sampling layer behind l+1, According to the back-propagation method , To get convolution layer l The gradient of each neural node corresponding to the weight , You need to find l The residual of each neural node of the layer δ, That is, the partial derivative of the cost function for each neural node . In order to find the residual , You need to sum up the residuals of the nodes at the next level to get , Then multiply by the weight corresponding to these connections W, Then multiply by l Input of this neural node in the layer u The activation function of f Partial derivative of , It can be obtained. l Residual of each neural node in layer . Due to the existence of down sampling , The residual of the neural node of the sampling layer corresponds to the region of the sampling window size in the output characteristic graph of the previous layer , therefore l Each neural node of each feature graph in the layer is only associated with l+1 The layer is connected with a neural node in the corresponding characteristic graph , Calculate the residual of each pixel in the feature map to get the residual map . For calculation l Layer residuals , The residuals corresponding to the upper sampling layer and the lower sampling layer are required , And then l The partial derivative of the excitation function of the characteristic graph of the layer and the residual graph obtained by up sampling are multiplied item by item . Finally, multiply the above results by the weight β, Thus, the convolution layer is obtained l Residual of . The residual of the characteristic graph in the convolution layer is calculated by the following formula ：

among , Symbol ο Represents the multiplication of each element ,up(·) Indicates an upsampling operation . If the sampling factor of down sampling is n, that up(·) The operation is to copy each element horizontally and vertically n Time . function up(·) It can be used Kronecker The product of To achieve . According to this residual figure , The gradient of the offset and convolution kernel corresponding to the characteristic graph can be calculated ：

among ,(u, v) Represents the coordinates of pixels in the feature map , To calculate when , And Multiplied item by item Elements .

Gradient calculation of lower sampling layer

The parameters involved in the down sampling forward process have a multiplicative factor corresponding to each characteristic graph β And an offset b. In order to find the lower sampling layer l Gradient of , You need to find the corresponding area between the residual graph of the current layer and the residual graph of the next layer , Then the residual is propagated back . Besides , It also needs to be multiplied by the weight between the input characteristic map and the output characteristic map , This weight is the parameter of convolution kernel . The formula is as follows ：