当前位置:网站首页>Neural network learning (IV) -- a simple summary of the knowledge of each layer of neural network

Neural network learning (IV) -- a simple summary of the knowledge of each layer of neural network

2022-06-09 03:10:00 Red date oats

**

Convolution layer

**
Why does the number of channels increase after convolution ?
answer : The number of convolution kernels , Stack after convolution , As shown in the figure below
 Insert picture description here
Neural network convolution layer is used to extract features , But what features are extracted ?

I am not very clear about this problem , I can only explain my humble opinion , If you know Jimei , I'd love to hear from you in the comments section , learn from each other .

Personally think that : Convolution feature extraction differ Image angle feature extraction , It is more of a convolution operation . The feature extraction of convolution is aimed at the pixels of the image , For example, a RGB Figure our human eyes can clearly know the content of the image , But the computer only knows that it is red 、 green 、 The blue channel is composed of three pixels . Convolution needs to use convolution to check the pixels of three channels for convolution operation , The final convolution result is obtained by continuously adjusting the convolution kernel parameters , At this point, we can understand the features contained in the image .

There are many convolution operations , Here's one :
 Insert picture description here
For this 4×4 Image , We use two 2×2 The convolution kernel of . Set step size to 1, That is to say, every time 2*2 The fixed window slides one unit to the right .

Feature_map Calculation formula of dimensions :[ ( Original image size - Convolution kernel size )/ step ] + 1

Then it goes up , Why can convolution extract features ?

After calculating by the first convolution kernel feature_map It's a three-dimensional data , The absolute value in the third column is the largest , It shows that there is a vertical feature in the corresponding place of the original picture , That is, the pixel value changes greatly ; And through the calculation of the second convolution kernel , The value in the third column is 0, The second line has the largest absolute value , It shows that there is a horizontal feature in the corresponding place of the original picture .

**

Feel the field

**
In the field of machine vision, there is a concept called receptive field in deep neural network , It is used to represent the receptive range of neurons at different positions in the network to the original image . The reason why neurons cannot perceive all the information of the original image , It is because convolution layer and pooling layer , The layers are locally connected ( adopt sliding filter). The larger the value of the receptive field of a neuron, the larger the range of the original image it can touch , It also means that he may have a bigger picture 、 Higher semantic features ; The smaller the value is, the more local and detailed the features are . Therefore, the value of receptive field can be roughly used to judge the abstraction level of each layer .

**

Network degradation

**
for instance , Suppose you have an optimal network structure , yes 18 layer . When we design the network structure , We don't know how many layers of networks are optimal network structure , Suppose you designed 34 Layer network structure . So many 16 Layer is actually redundant , We want to train the network , The model can train these five layers to be identity mapping , That is, when passing through this layer, the input and output are exactly the same . But it is often difficult for the model to put this 16 The parameter learning of layer identity mapping is correct , Then it will not be better than optimization 18 Layer network structure has good performance , This is as the network depth increases , The model will degenerate . It is not produced by over fitting , It is caused by the redundant network layer learning, not the parameters of identity mapping .

Network degradation solution : Using a residual network resnet , The brief introduction is as follows :(F(x) For the residuals )

 Insert picture description here
 Insert picture description here
 Insert picture description here
Be careful !!! CNN The convolution layer parameters of the first several layers in the network account for a small proportion , The amount of calculation accounts for a large proportion ; The back full connection layer is just the opposite , Most of the CNN All networks have this characteristic . Therefore, when we carry out computational acceleration optimization , Focus on convolution ; Carry out parameter optimization 、 Weight clipping , Focus on the full connectivity layer .

**

Pooling layer

**( Also called undersampling or undersampling , Reduce the dimension of features )

It is divided into maximum pooling and average pooling , Maximum pooling is to find the maximum value in the window , Average pooling is to find the average value in the window .

Purpose : Compress the input feature graph , On the one hand, make the feature map smaller , Simplify network computing complexity ; One is feature compression , Extract the main features .

 Insert picture description here

**

Zero filling layer

**
If you keep adding layers, the size of the picture becomes smaller and smaller , At this time, add the zero filling layer (Zero Padding).

**

Fully connected layer

**

Connect all features , Send the output value to the classifier ( Commonly used softmax classifier , Calculate the classification probability ).

 Insert picture description here
General structure of convolution network :
 Insert picture description here
**

Feature fusion :

**
Many works improve the performance of detection and segmentation by fusing multi-layer features , According to the order of fusion and prediction , It is classified as early fusion (Early fusion) And late fusion (Late fusion).

Early fusion (Early fusion): First merge the features of multiple layers , Then the predictor is trained on the fused features ( Only after full integration , We're going to test it in a unified way ). This kind of method is also called skip connection, using concat、add operation . The representative of this idea is Inside-Outside Net(ION) and HyperNet.

Two classic feature fusion methods :
(1)concat: Fusion of series features , Connect the two features directly . Two input features x and y If the dimension of is p and q, Output characteristics z The dimension of is p+q;
(2) add: Parallel strategy , Combine these two eigenvectors into a complex vector , For input features x and y,z = x + iy, among i Imaginary units .

Late fusion (Late fusion): The detection performance is improved by combining the detection results of different layers ( Before the final fusion , Detection starts at the partially fused layer , There will be multiple layers of detection , Finally, multiple detection results are fused ). There are two kinds of ideas for this kind of research :
(1)feature No integration , Multiscale feture Forecast separately , Then we synthesize the prediction results , Such as Single Shot MultiBox Detector (SSD) , Multi-scale CNN(MS-CNN)
(2)feature Pyramidal fusion , Forecast after fusion , Such as Feature Pyramid Network(FPN) etc. .

原网站

版权声明
本文为[Red date oats]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/160/202206090307228234.html