当前位置：网站首页>[reading papers] comparison of deeplobv1-v3 series, brief review

[reading papers] comparison of deeplobv1-v3 series, brief review

2022-06-13 02:24:00 【Shameful child】

Deeplab Series for Semantic segmentation task . For the task of semantic segmentation ：
- Semantic segmentation is the process of image segmentation Intensive segmentation tasks , Division Each pixel to the specified category On ;
- Divide the image into several meaningful The goal of ;
- Assign the specified category label to the object .
In the traditional semantic segmentation problem , There are three challenges ：
- Traditional classification CNN in Continuous pooling and downsampling lead to a decrease in spatial resolution .
  - Because semantic segmentation is a pixel level classification , Highly abstract spatial features Yes low-level Not applicable , Therefore, we must consider feature map Dimensional and spatial invariance of .
  - feature map Smaller because stride The existence of ,stride>1 To increase the sensitivity of the receptive field , If stride=1, Make sure you feel the same , Then the size of convolution kernel must be larger , therefore ,deeplabv1 The paper Use hole Algorithm to increase the nuclear size to achieve the same receptive field , That is, empty convolution .
  - Image input CNN The latter is a process of gradual abstraction , The original Location information Meeting Decrease or even disappear with depth . Conditional random field is a smoothing method in traditional image processing , That is, when determining the pixel value of a position , Be able to consider the pixel values of surrounding neighbors , Eliminate some noise .
  - The specific operation is ： Remove the last two pooling layers of the original network , Use rate=2 Hole convolution sampling .
- Object to scale detection problem , Use to rescale and aggregate feature maps , But it's a lot of calculation .
- Object centered classification , We need to ensure the invariance of spatial transformation .

 link ：https://www.jianshu.com/p/9184455a4bd3

ASPP structure
- SPP：Spatial Pyramid Pooling- He Kaiming
  - requirement ： stay R-CNN China needs Fix the size of the input picture , Because the structure of the full connection layer behind the convolution layer is fixed .
  - Live situation ： But in reality , Our input image size is always Cannot meet the size required for input , However, the usual technique is cutting (crop) And stretching (warp), But it is not good to do so , It distorts the original features .
  - programme ：SPP Layers pass through The feature map of the candidate region is divided into several grids of different sizes , Then on Maximum pooling is done in each grid , In this way, the following full connection layer can still get fixed input .
  - application ：Fast-RCNN Medium ROI pooling Layer is actually a special spatial pyramid pooling, Their ideas are similar , It's just ROI pooling Pool only one size of grid , and spatial pyramid pooling At the same time, a variety of grid sizes are used .
- PPM：Pyramid Pooling Module
  - location ： yes PSPNet A module proposed in ,PSPNET Is a network for semantic segmentation .
  - problem ： Generally speaking The deeper the network, the greater the feeling field (resceptive field), But there is still a gap between the receptive field in theory and that in the actual network （ The actual receptive field is smaller than the theoretical receptive field ）, This makes the network unable to effectively fuse the global feature information
  - Find out ：GAP(Global Average Pooling, Global average pooling ) It can effectively fuse the global context information , But its ability of information fusion and extraction is limited , And simple to use GAP The information Compressing into one channel can easily lose a lot of useful information , So it will The feature fusion of different receptive fields and sub regions can enhance the ability of feature representation .
  - programme ：PPM modular , From the front network The extracted feature graph is divided into two branches , A branch is divided into several sub regions GAP（ This is related to PSP The structure in the module is similar ）, next use 1*1 To adjust the channel size by convolution , Again The size before pooling is obtained by bilinear interpolation , Finally, merge the two branches .
  - application ：PSP The module can aggregate the context of different regions to obtain the global context .
- Cavity convolution （Atrous/Dilated Convolution）
  - You want to have a large receptive field for the features extracted from the picture , Too much computation
  - We want the resolution of the feature map not to drop too much （ Too much loss of resolution will lose a lot of detailed information about the image boundary ）
  - The above two ideas are contradictory in practice , If you want to obtain a larger receptive field, you need to use a larger convolution kernel or a larger one when pooling strid, For the former, the amount of calculation is too large , The latter will lose resolution .
  - Void convolution is used to solve this contradiction . You can get Larger receptive field , And the resolution Don't lose too much .
  - The advantage of empty convolution is not to do pooling In case of loss of information , Increased receptive field , Let each convolution output contain a wide range of information .
  - In the field of image segmentation , Image input to CNN（ Typical networks, such as FCN） in ,FCN First like Conventional CNN Then convolute the image and pooling, Reduce the image size and increase the receptive field , But because image segmentation prediction is pixel-wise Output , So will pooling Smaller image size after upsampling To the original image size for prediction , Previous pooling Operation makes every pixel Large receptive field information can be seen in prediction .
  - So image segmentation FCN There are two keys , One is pooling Reduce the image size and increase the receptive field , The other is upsampling Enlarge image size .
  - In the process of reducing and then increasing the size , Some information must have been lost , So can you design a new operation , Not through pooling You can also have a larger receptive field to see more information ？ The answer is dilated conv.
  - dilated The advantage is not to do pooling In case of loss of information , Increased receptive field , Let each convolution output contain a wide range of information .
  - ```
   link ：https://www.zhihu.com/question/54149221/answer/192025860
```
- When performing a split task , Images There is a multi-scale problem , Big and small .
- The common processing method is image pyramid , namely Put the original picture resize To different scales , Input to the same network , get Different feature map, Then do fusion , This method can really improve the accuracy , However, it brings Another problem is that the speed is too slow .
- DeepLab v2 To solve this problem , Refer to the SPP、PPM Wait for the introduction of ASPP（atrous spatial pyramid pooling） modular , That is take feature map Through parallel cavity convolution layers with different expansion rates, multi-scale information is captured , And the output results are fused to get the segmentation results of the image .
- stay deeplab v3 Improved in ASPP. One was used 1×1 Convolution sum of 3 individual 3×3 The void convolution of , Each convolution kernel has 256 All of them BN layer .
  - 1×1 The convolution of is equivalent to rate Very large hole convolution , because rate The bigger it is , The smaller the effective parameter of the convolution kernel , This 1×1 The convolution kernel of is equivalent to large rate The parameter of the center of the convolution kernel .

deeplabv1

The basic network is VGG-16 Last change

problem	reason	Solution
Down sampling results in loss of information	maxpool cause feature map Size down , Details missing . The classification process is more " senior " Characteristics of , Segmentation is to classify each pixel , Need more details .	Cavity convolution
Space invariance	The model is insensitive to the spatial position of the input image , No matter how the picture rotates , Translation, etc , Can recognize . For classification , This is a ok Of . But for segmentation , This is not OK 了 , After the picture is rotated , Of course, the classification of each pixel will change .	Conditional random field CRF

maxpool The meaning of ： On the one hand, it's for Reduce feature map Size . On the one hand, it is also for increase feature map The receptive field of each element in .
FCN How to do ： After getting the feature map , use deconv The way sampling will feature map Restore to original image size . But this kind of First max pool Down sampling and re sampling deconv The upsampling process is bound to lose some information .
deeplab solve FCN Problems in ：deeplab Put forward a kind of dilated conv( Cavity convolution ) The convolution of , To replace max pool, But at the same time, the receptive field is not lost .
- dilated The advantage is not to do pooling In case of loss of information , Increased receptive field , Let each convolution output contain a wide range of information ., For image segmentation, this kind of spatial location , For tasks that are sensitive to details , Void convolution is a good way .
because CNN Natural space invariance , So for the split task ,CNN The extracted features are not fine enough . So after feature extraction , We will have a Add one more CRF, achieve More refined feature extraction purposes .

DeepLab V2
- stay v1 Based on the improvement
- The basic network is resnet Last change
- ASPP： Space pyramid . In order to use Solve multi-scale problems . That is, for the same object , Whether it is large or small in the image, it can be accurately identified .
DeepLab V3
- Removed crf.
- Transformed resnet, stay resnet Using hole convolution and pyramid hole convolution
  - Using hole convolution , Remove the lower sample , such , Make sure to feel the wild and feature map Constant size
    - feature map While the size of , It is equivalent to the loss of spatial information .
    - Image segmentation is very sensitive to spatial information , So keep feature map Of size It's very important ,deeplab The design idea of the series is the same , That is to say Keep the receptive field , Keep at the same time feature map Of size unchanged
    - By means of hole convolution , While maintaining the receptive field , It has not been reduced feature map The size of the .
  - Use inside the residual block ASPP, Ensure multi-scale sensitivity
    - stay block For internal use rate Different hole convolutions are convoluted in parallel , And then to get feature Do fusion .

原网站

版权声明
本文为[Shameful child]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202280543148903.html

当前位置：网站首页>[reading papers] comparison of deeplobv1-v3 series, brief review

[reading papers] comparison of deeplobv1-v3 series, brief review

边栏推荐

猜你喜欢

随机推荐