当前位置:网站首页>[reading papers] comparison of deeplobv1-v3 series, brief review

[reading papers] comparison of deeplobv1-v3 series, brief review

2022-06-13 02:24:00 Shameful child

  • Deeplab Series for Semantic segmentation task . For the task of semantic segmentation :

    • Semantic segmentation is the process of image segmentation Intensive segmentation tasks , Division Each pixel to the specified category On ;
    • Divide the image into several meaningful The goal of ;
    • Assign the specified category label to the object .
  • In the traditional semantic segmentation problem , There are three challenges :

    • Traditional classification CNN in Continuous pooling and downsampling lead to a decrease in spatial resolution .
      • Because semantic segmentation is a pixel level classification , Highly abstract spatial features Yes low-level Not applicable , Therefore, we must consider feature map Dimensional and spatial invariance of .
      • feature map Smaller because stride The existence of ,stride>1 To increase the sensitivity of the receptive field , If stride=1, Make sure you feel the same , Then the size of convolution kernel must be larger , therefore ,deeplabv1 The paper Use hole Algorithm to increase the nuclear size to achieve the same receptive field , That is, empty convolution .
      • Image input CNN The latter is a process of gradual abstraction , The original Location information Meeting Decrease or even disappear with depth . Conditional random field is a smoothing method in traditional image processing , That is, when determining the pixel value of a position , Be able to consider the pixel values of surrounding neighbors , Eliminate some noise .
      • The specific operation is : Remove the last two pooling layers of the original network , Use rate=2 Hole convolution sampling .
    • Object to scale detection problem , Use to rescale and aggregate feature maps , But it's a lot of calculation .
    • Object centered classification , We need to ensure the invariance of spatial transformation .
  •  link :https://www.jianshu.com/p/9184455a4bd3
    
  • ASPP structure

    • SPP:Spatial Pyramid Pooling- He Kaiming

      • requirement : stay R-CNN China needs Fix the size of the input picture , Because the structure of the full connection layer behind the convolution layer is fixed .
      • Live situation : But in reality , Our input image size is always Cannot meet the size required for input , However, the usual technique is cutting (crop) And stretching (warp), But it is not good to do so , It distorts the original features .
      • programme :SPP Layers pass through The feature map of the candidate region is divided into several grids of different sizes , Then on Maximum pooling is done in each grid , In this way, the following full connection layer can still get fixed input .
      • application :Fast-RCNN Medium ROI pooling Layer is actually a special spatial pyramid pooling, Their ideas are similar , It's just ROI pooling Pool only one size of grid , and spatial pyramid pooling At the same time, a variety of grid sizes are used .
    • PPM:Pyramid Pooling Module

      • location : yes PSPNet A module proposed in ,PSPNET Is a network for semantic segmentation .
      • problem : Generally speaking The deeper the network, the greater the feeling field (resceptive field), But there is still a gap between the receptive field in theory and that in the actual network ( The actual receptive field is smaller than the theoretical receptive field ), This makes the network unable to effectively fuse the global feature information
      • Find out :GAP(Global Average Pooling, Global average pooling ) It can effectively fuse the global context information , But its ability of information fusion and extraction is limited , And simple to use GAP The information Compressing into one channel can easily lose a lot of useful information , So it will The feature fusion of different receptive fields and sub regions can enhance the ability of feature representation .
      • programme :PPM modular , From the front network The extracted feature graph is divided into two branches , A branch is divided into several sub regions GAP( This is related to PSP The structure in the module is similar ), next use 1*1 To adjust the channel size by convolution , Again The size before pooling is obtained by bilinear interpolation , Finally, merge the two branches .
      • application :PSP The module can aggregate the context of different regions to obtain the global context .
    • Cavity convolution (Atrous/Dilated Convolution)

      • You want to have a large receptive field for the features extracted from the picture , Too much computation

      • We want the resolution of the feature map not to drop too much ( Too much loss of resolution will lose a lot of detailed information about the image boundary )

      • The above two ideas are contradictory in practice , If you want to obtain a larger receptive field, you need to use a larger convolution kernel or a larger one when pooling strid, For the former, the amount of calculation is too large , The latter will lose resolution .

      • Void convolution is used to solve this contradiction . You can get Larger receptive field , And the resolution Don't lose too much .

      • The advantage of empty convolution is not to do pooling In case of loss of information , Increased receptive field , Let each convolution output contain a wide range of information .

      • In the field of image segmentation , Image input to CNN( Typical networks, such as FCN) in ,FCN First like Conventional CNN Then convolute the image and pooling, Reduce the image size and increase the receptive field , But because image segmentation prediction is pixel-wise Output , So will pooling Smaller image size after upsampling To the original image size for prediction , Previous pooling Operation makes every pixel Large receptive field information can be seen in prediction .

      • So image segmentation FCN There are two keys , One is pooling Reduce the image size and increase the receptive field , The other is upsampling Enlarge image size .

      • In the process of reducing and then increasing the size , Some information must have been lost , So can you design a new operation , Not through pooling You can also have a larger receptive field to see more information ? The answer is dilated conv.

      • dilated The advantage is not to do pooling In case of loss of information , Increased receptive field , Let each convolution output contain a wide range of information .

      •  link :https://www.zhihu.com/question/54149221/answer/192025860
        
    • When performing a split task , Images There is a multi-scale problem , Big and small .

    • The common processing method is image pyramid , namely Put the original picture resize To different scales , Input to the same network , get Different feature map, Then do fusion , This method can really improve the accuracy , However, it brings Another problem is that the speed is too slow .

    • DeepLab v2 To solve this problem , Refer to the SPP、PPM Wait for the introduction of ASPP(atrous spatial pyramid pooling) modular , That is take feature map Through parallel cavity convolution layers with different expansion rates, multi-scale information is captured , And the output results are fused to get the segmentation results of the image .

    • stay deeplab v3 Improved in ASPP. One was used 1×1 Convolution sum of 3 individual 3×3 The void convolution of , Each convolution kernel has 256 All of them BN layer .

      • 1×1 The convolution of is equivalent to rate Very large hole convolution , because rate The bigger it is , The smaller the effective parameter of the convolution kernel , This 1×1 The convolution kernel of is equivalent to large rate The parameter of the center of the convolution kernel .
      •  Insert picture description here
  • deeplabv1

    • The basic network is VGG-16 Last change

    • problem reason Solution
      Down sampling results in loss of information maxpool cause feature map Size down , Details missing . The classification process is more " senior " Characteristics of , Segmentation is to classify each pixel , Need more details . Cavity convolution
      Space invariance The model is insensitive to the spatial position of the input image , No matter how the picture rotates , Translation, etc , Can recognize . For classification , This is a ok Of . But for segmentation , This is not OK 了 , After the picture is rotated , Of course, the classification of each pixel will change . Conditional random field CRF
    • maxpool The meaning of : On the one hand, it's for Reduce feature map Size . On the one hand, it is also for increase feature map The receptive field of each element in .

    • FCN How to do : After getting the feature map , use deconv The way sampling will feature map Restore to original image size . But this kind of First max pool Down sampling and re sampling deconv The upsampling process is bound to lose some information .

    • deeplab solve FCN Problems in :deeplab Put forward a kind of dilated conv( Cavity convolution ) The convolution of , To replace max pool, But at the same time, the receptive field is not lost .

      • dilated The advantage is not to do pooling In case of loss of information , Increased receptive field , Let each convolution output contain a wide range of information ., For image segmentation, this kind of spatial location , For tasks that are sensitive to details , Void convolution is a good way .
    • because CNN Natural space invariance , So for the split task ,CNN The extracted features are not fine enough . So after feature extraction , We will have a Add one more CRF, achieve More refined feature extraction purposes .

  • DeepLab V2

    • stay v1 Based on the improvement
    • The basic network is resnet Last change
    • ASPP: Space pyramid . In order to use Solve multi-scale problems . That is, for the same object , Whether it is large or small in the image, it can be accurately identified .
      •  Insert picture description here
  • DeepLab V3

    • Removed crf.
    • Transformed resnet, stay resnet Using hole convolution and pyramid hole convolution
      • Using hole convolution , Remove the lower sample , such , Make sure to feel the wild and feature map Constant size
        • feature map While the size of , It is equivalent to the loss of spatial information .
        • Image segmentation is very sensitive to spatial information , So keep feature map Of size It's very important ,deeplab The design idea of the series is the same , That is to say Keep the receptive field , Keep at the same time feature map Of size unchanged
        • By means of hole convolution , While maintaining the receptive field , It has not been reduced feature map The size of the .
      • Use inside the residual block ASPP, Ensure multi-scale sensitivity
        • stay block For internal use rate Different hole convolutions are convoluted in parallel , And then to get feature Do fusion .
原网站

版权声明
本文为[Shameful child]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202280543148903.html