当前位置：网站首页>Neural network learning (IV) -- a simple summary of the knowledge of each layer of neural network

Neural network learning (IV) -- a simple summary of the knowledge of each layer of neural network

2022-06-09 03:10:00 【Red date oats】

Convolution layer

**
Why does the number of channels increase after convolution ？
answer ： The number of convolution kernels , Stack after convolution , As shown in the figure below
Insert picture description here
Neural network convolution layer is used to extract features , But what features are extracted ？

I am not very clear about this problem , I can only explain my humble opinion , If you know Jimei , I'd love to hear from you in the comments section , learn from each other .

Personally think that ： Convolution feature extraction differ Image angle feature extraction , It is more of a convolution operation . The feature extraction of convolution is aimed at the pixels of the image , For example, a RGB Figure our human eyes can clearly know the content of the image , But the computer only knows that it is red 、 green 、 The blue channel is composed of three pixels . Convolution needs to use convolution to check the pixels of three channels for convolution operation , The final convolution result is obtained by continuously adjusting the convolution kernel parameters , At this point, we can understand the features contained in the image .

There are many convolution operations , Here's one ：
Insert picture description here
For this 4×4 Image , We use two 2×2 The convolution kernel of . Set step size to 1, That is to say, every time 2*2 The fixed window slides one unit to the right .

Feature_map Calculation formula of dimensions ：[ ( Original image size - Convolution kernel size )/ step ] + 1

Then it goes up , Why can convolution extract features ？

After calculating by the first convolution kernel feature_map It's a three-dimensional data , The absolute value in the third column is the largest , It shows that there is a vertical feature in the corresponding place of the original picture , That is, the pixel value changes greatly ; And through the calculation of the second convolution kernel , The value in the third column is 0, The second line has the largest absolute value , It shows that there is a horizontal feature in the corresponding place of the original picture .

Feel the field

**
In the field of machine vision, there is a concept called receptive field in deep neural network , It is used to represent the receptive range of neurons at different positions in the network to the original image . The reason why neurons cannot perceive all the information of the original image , It is because convolution layer and pooling layer , The layers are locally connected （ adopt sliding filter）. The larger the value of the receptive field of a neuron, the larger the range of the original image it can touch , It also means that he may have a bigger picture 、 Higher semantic features ; The smaller the value is, the more local and detailed the features are . Therefore, the value of receptive field can be roughly used to judge the abstraction level of each layer .

Network degradation

**
for instance , Suppose you have an optimal network structure , yes 18 layer . When we design the network structure , We don't know how many layers of networks are optimal network structure , Suppose you designed 34 Layer network structure . So many 16 Layer is actually redundant , We want to train the network , The model can train these five layers to be identity mapping , That is, when passing through this layer, the input and output are exactly the same . But it is often difficult for the model to put this 16 The parameter learning of layer identity mapping is correct , Then it will not be better than optimization 18 Layer network structure has good performance , This is as the network depth increases , The model will degenerate . It is not produced by over fitting , It is caused by the redundant network layer learning, not the parameters of identity mapping .

Network degradation solution ： Using a residual network resnet , The brief introduction is as follows ：（F（x） For the residuals ）

Insert picture description here

Be careful !!! CNN The convolution layer parameters of the first several layers in the network account for a small proportion , The amount of calculation accounts for a large proportion ; The back full connection layer is just the opposite , Most of the CNN All networks have this characteristic . Therefore, when we carry out computational acceleration optimization , Focus on convolution ; Carry out parameter optimization 、 Weight clipping , Focus on the full connectivity layer .

Pooling layer

**（ Also called undersampling or undersampling , Reduce the dimension of features ）

It is divided into maximum pooling and average pooling , Maximum pooling is to find the maximum value in the window , Average pooling is to find the average value in the window .

Purpose ： Compress the input feature graph , On the one hand, make the feature map smaller , Simplify network computing complexity ; One is feature compression , Extract the main features .

Insert picture description here

Zero filling layer

**
If you keep adding layers, the size of the picture becomes smaller and smaller , At this time, add the zero filling layer （Zero Padding）.

Fully connected layer

Connect all features , Send the output value to the classifier （ Commonly used softmax classifier , Calculate the classification probability ）.

Insert picture description here
General structure of convolution network ：

**

Feature fusion ：

**
Many works improve the performance of detection and segmentation by fusing multi-layer features , According to the order of fusion and prediction , It is classified as early fusion (Early fusion) And late fusion (Late fusion).

Early fusion (Early fusion): First merge the features of multiple layers , Then the predictor is trained on the fused features （ Only after full integration , We're going to test it in a unified way ）. This kind of method is also called skip connection, using concat、add operation . The representative of this idea is Inside-Outside Net(ION) and HyperNet.

Two classic feature fusion methods ：
（1）concat： Fusion of series features , Connect the two features directly . Two input features x and y If the dimension of is p and q, Output characteristics z The dimension of is p+q;
（2） add： Parallel strategy , Combine these two eigenvectors into a complex vector , For input features x and y,z = x + iy, among i Imaginary units .

Late fusion (Late fusion)： The detection performance is improved by combining the detection results of different layers （ Before the final fusion , Detection starts at the partially fused layer , There will be multiple layers of detection , Finally, multiple detection results are fused ）. There are two kinds of ideas for this kind of research ：
（1）feature No integration , Multiscale feture Forecast separately , Then we synthesize the prediction results , Such as Single Shot MultiBox Detector (SSD) , Multi-scale CNN(MS-CNN)
（2）feature Pyramidal fusion , Forecast after fusion , Such as Feature Pyramid Network(FPN) etc. .

原网站

版权声明
本文为[Red date oats]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/160/202206090307228234.html