当前位置：网站首页>Summary 1 - deep learning - basic knowledge learning

Summary 1 - deep learning - basic knowledge learning

2022-07-25 22:39:00 【III】

【 Notes 】 The difference between down sampling and pooling ：
Pool of God explanation ：
Pooling = Rising water
The process of pooling = Raise the water level （ Expand the matrix grid ）
The purpose of pooling is to get the edge shape of the object . It is conceivable that the water needs to understand the three-dimensional shape of the mountain , When the water level is low, we can get the shape of the foot of the mountain , When the water level is medium, the shape of the hillside is obtained , When the water level is high, we can get the shape of the top of the mountain , At three o'clock, you can roughly draw the simple strokes of the mountain . The process of convolution is to distinguish where water is , Where is the mountain . For network structure , The upper layer looks at the lower layer passing pooling The later feature map , It's like looking down at the earth from space , All I saw were ridges and snow peaks . This is the further abstraction of features in the macro .
Pooling method ：
1、max-pooling： Take the maximum value of the feature points in the neighborhood ;
2、mean-pooling： Average the feature points in the neighborhood
The role of pooling ：
1、 Dimension reduction , Reduce the number of parameters the network needs to learn ;
2、 Prevent over fitting ;
3、 Expand the feeling field ;
4、 Implement invariance （ translation 、 rotate 、 Scale invariance ）
The difference between convolution and pooling ： Pooling （pooling）, That is, downsampling （subsample）, Reduce the size of the data .
Set the input as , The size of the pool layer is ,stride Step: , The size of the output matrix is , be ：
Convolution :

Pooling ：
Relu The function of activation ： In a word , Actually ,relu The function is to increase the nonlinear relationship between the layers of the neural network , otherwise , If there is no activation function , There is a simple linear relationship between layers , Each layer is equivalent to matrix multiplication , In this way, how can we complete the complex tasks we need neural network to complete .ReLU The effectiveness of is reflected in two aspects ：1、 More efficient gradient descent and back propagation ： The problems of gradient explosion and gradient disappearance are avoided ;2、 Simplify the calculation process , Speed up your training .
Deep learning VGG Model core disassembly ,AlexNet and VGGNet, See this , Let me have a look first CNN The Internet .
CNN（ Convolutional neural networks ） introduction ,
CNN Basic knowledge of —— Convolution （Convolution）、 fill （Padding）、 step (Stride)
What is called in deep learning Convolution operation （Convolution）, In fact, it is called Cross correlation （cross-correlation） operation ： Add the image to the matrix , From left to right , From top to bottom , Take a part of the same size as the filter , The value in each part is multiplied by the value in the filter and then summed , The final results form a matrix , There is no flipping of the nucleus .
Let's take gray image as an example to explain ： From a small weight matrix , Convolution kernel （kernel） Start , Let it step by step on the two-dimensional input data “ scanning ”. Convolution kernel “ slide ” At the same time , Calculate the product of the weight matrix and the scanned data matrix , Then summarize the results into an output pixel . See the picture below .

Sometimes we also hope that the size of input and output should be consistent . To solve this problem , Before the convolution operation , Boundary the original matrix fill （Padding）, That is, fill in some values on the boundary of the matrix , To increase the size of the matrix , Usually “0” To fill in .

In convolution process , Sometimes need to pass padding To avoid information loss , Sometimes it is also necessary to pass the setting during convolution step （Stride） To compress some information , Or make the output size smaller than the input size .

Stride The role of ： Is to double the size , The value of this parameter is the specific multiple of reduction , For example, the stride is 2, The output is the input 1/2; The stride is 3, The output is the input 1/3. And so on . See the picture below .
【 The size of convolution kernel is generally odd * Odd number 】 1*1,3*3,5*5,7*7 Are the most common . Why is that ？ Why is there no even number * even numbers ？
（1） More easily padding
In convolution , We sometimes need the same size before and after convolution . We need to use padding. Suppose the size of the image , That is, the size of the convoluted object is n*n, The convolution kernel size is k*k,padding The amplitude of is set to (k-1)/2 when , The output after convolution is (n-k+2*((k-1)/2))/1+1=n, That is, the convolution output is n*n, It ensures that the size before and after convolution remains unchanged . But if k If it's even ,(k-1)/2 It's not an integer .
（2） Easier to find convolution anchors
stay CNN in , When performing convolution operation, it will generally slide based on a position of the convolution kernel module , This benchmark is usually the center of the convolution kernel module . If convolution kernel is odd , Convolution anchors are easy to find , Nature is the center of the convolution module , But if the convolution kernel is even , There's no way to be sure , It doesn't seem very good to let who is the anchor .
【 Calculation formula of convolution 】
Enter the size of the picture ： It's usually used    For input image size .
The size of the convolution kernel ： It's usually used    Represents the size of the convolution kernel .
fill （Padding）： It's usually used    To represent the fill size .
step (Stride)： It's usually used To represent the step size .
The size of the output picture ： It's usually used To express .
If known   、、  、    Can be found   , The calculation formula is as follows ：
      among "" Is the rounding down symbol , Used to round down when the result is not an integer .
【 Multichannel convolution 】
The above examples contain only one input channel . actually , Most input images have RGB 3 Channels .
This is about “ Convolution kernel ” and “filter” The difference between the two terms . In the case of only one channel ,“ Convolution kernel ” Equivalent to “filter”, The two concepts are interchangeable . But in general , They are two completely different concepts . Every “filter” Actually, it happens to be “ Convolution kernel ” A collection of , In the current layer , Each channel corresponds to a convolution kernel , And this convolution kernel is unique .
The calculation process of multi-channel convolution ： Convolute the matrix with each channel corresponding to the filter , Final addition , Form a single channel output , Add the offset term , We get a final single channel output . If there are multiple filter, At this time, we can combine these final single channel outputs into a total output .
We still need Be careful Some of the problems —— The number of channels of the filter 、 The number of channels to output the characteristic graph .
The number of channels of a filter = The number of channels of the characteristic graph of the upper layer . As shown in the figure above , We enter a Of RGB picture , So filter （） There should also be three channels .
The number of channels of the output characteristic graph of a certain layer = The number of filters in the current layer . As shown in the figure above , When there is only one filter when , Output characteristic map （） The number of channels is 1; When there is 2 individual filter when , Output characteristic map （） The number of channels is 2.
Classification data set ： fenleiCIFAR-10 Data set description 、 Description link 2
CIFAR-10 By Hinton Of the students Alex Krizhevsky and Ilya Sutskever A small data set for identifying universal objects . It includes 10 Category RGB Color picture slice ： The plane （ a Kowtow lane ）、 automobile （ automobile ）、 birds （ bird ）、 cat （ cat ）、 deer （ deer ）、 Dog （ dog ）、 Frogs （ frog ）、 Horse （ horse ）、 ship （ ship ） And trucks （ truck ）. The size of the picture is 32×32 , There are... In the data set 50000 Zhang training prison film and 10000 Test pictures . CIFAR-10 The image sample of is shown in the figure .
Deep neural network （Deep Neural Networks,DNN）, Reference article ： Deep neural network (DNN)
although DNN It looks complicated , But for small local models , It's still the same as the perceptron , It's a linear relationship Add an activation function
CNN Introduction to the Internet , Refer to a good article 《CNN Introduction to the Internet 》
CNN（ Convolutional neural networks ） introduction , Reference article 《CNN（ Convolutional neural networks ） introduction 》
《 The function of full connection layer 》
What is the use of the full connection layer ？ Let me talk about three points .
1、 Fully connected layer （fully connected layers,FC） In the whole convolution neural network “ classifier ” The role of . If we talk about convolution 、 Pooling layer And activate the function layer to map the original data to the hidden layer feature space , The whole connection layer plays the role of What will be learned “ Distributed feature representation ” The role of mapping to the sample marker space . In practical use , The full connection layer can be realized by convolution operation ： If the front layer is fully connected, the fully connected layer can be transformed into convolution kernel 1x1 Convolution of ; The front layer is the full connection layer of the convolution layer, which can be transformed into convolution kernel hxw Global convolution of ,h and w They are the height and width of the convolution result of the previous layer （ notes 1）.

2、 At present, due to the parameter redundancy of the whole connection layer （ Only the full connection layer parameters can account for the whole network parameters 80% about ）, Recently, some network models with excellent performance, such as ResNet and GoogLeNet All of them are pooled by global average （global average pooling,GAP） replace FC To integrate the depth features learned , Finally still use softmax etc. Loss function As a network objective function to guide the learning process . It's important to point out that , use GAP replace FC The network usually has better prediction performance .

3、 stay FC At a time when people are less and less optimistic , Our recent research （In Defense of Fully Connected Layers in Visual Representation Transfer） Find out ,FC It can be used in the process of model representation capability migration “ A firewall ” The role of . In particular , Suppose that ImageNet The model obtained from the previous pre training is , be ImageNet Can be regarded as the source domain （ In transfer learning source domain）. fine-tuning （fine tuning） It is the most commonly used transfer learning technology in the field of deep learning . For fine tuning , If the target domain （target domain） The image in is very different from the image in the source domain （ As compared ImageNet, The target domain image is not an object centered image , But the scenery , See the picture below ）, Not included FC The result of network fine-tuning is worse than that of including FC Network of . therefore FC Can be regarded as a model to represent the ability of “ A firewall ”, Especially when the source domain is quite different from the target domain ,FC Large models can be maintained capacity So as to ensure the transfer of model representation ability .（ Redundant parameters are not useless .）