当前位置:网站首页>[reading point paper] deeplobv3 rethinking atlas revolution for semantic image segmentation ASPP

[reading point paper] deeplobv3 rethinking atlas revolution for semantic image segmentation ASPP

2022-06-13 02:20:00 Shameful child

Rethinking Atrous Convolution for Semantic Image Segmentation

  • Cascade or parallel convolution module of multi-scale velocity To capture multiscale context , In order to solve the problem of multi-scale object segmentation
  • Atrous Spatial Pyramid Pooling module:Atrous Space pyramid pool module
  • The module detects convolution features of multiple scales , Use image level features to encode the global context , Further improve performance
tips: Proposed “DeepLabv3” The system is not DenseCRF In the case of post-processing, it significantly improves our previous DeepLab edition 
  • Two challenges of deep convolution neural network in semantic segmentation task

    • Caused by a continuous pool operation or convolution step Feature resolution is reduced , This makes dcnn Learn more and more abstract feature representation
    • This invariance of local image transformation may hinder intensive prediction tasks , The intensive prediction task requires detailed spatial information
    • The existence of objects on multiple scales ( An optional architecture that captures multi-scale context 4 Kind of )

      • take DCNN Applied to the In the image pyramid , Extract the input features of each scale , Targets of different scales become prominent on different feature maps

        •  Insert picture description here
    -  The same model , Usually have shared weights , Applied to multiscale input 
    
      -  The same model , Usually have shared weights , Applied to multiscale input 
    
      -  Large scale input preserves the details of small objects 
    
      - >  Example 
        >
        > -  Input image through Laplacian pyramid transform , Enter the input of each scale into DCNN in , And merge feature maps from all scales 
        > -  Use multi-scale input from coarse to fine 
        > -  Directly adjust the input of multiple scales , Integrate features of all scales 
    
    • Encoder - The decoder structure utilizes Multi-scale feature of encoder part , from The decoder partially restores the spatial resolution

      •  Insert picture description here
    -  In the encoder ,** The spatial dimension of the characteristic graph decreases gradually **, Thus, it is easier to capture longer distance information in the depth output by the encoder ;(b) In the encoder of , The target details and spatial dimensions are gradually restored .
    
    • Additional modules are cascaded over the original network , For capturing remote information ( Context module )

      • DenseCRF Pairwise similarity for encoding pixel level , adopt Cascade development of several additional convolution layers To gradually capture the remote context .
      •  Insert picture description here
    -  The model contains additional modules arranged in a cascading fashion , To encode the remote context 
      -  An effective way is to DenseCRF( Combined with efficient high-dimensional filtering algorithm ) Add to dcnn
      -  Put forward ** Joint training CRF and DCNN Components **
      -  stay DCNN Confidence graph of (** A confidence graph is a graph that ultimately contains output channels equal to the number of predicted classes DCNN Characteristics of figure **) Several additional convolution layers are used on to capture context information 
    
    • Space Pyramid pooling Through multi rate 、 Multiple effective field of view filters or pooling operations detect incoming feature map, So as to capture targets of multiple scales .

      •  Insert picture description here
    -  On the characteristic graph 4 Different Atrous Rate parallelism Atrous Convolution 
    
    -  In cascaded modules and `SPP` Under the framework of , We use `Atrous Convolution` To increase the receptive field of the filter to fuse multi-scale contextual information .
    
    -  It shows that it can effectively resample the features of different scales , So as to accurately and effectively classify regions of any scale , Using different rates of ASPP It can effectively capture multi-scale information 
    
      -  With the increase of sampling rate , Effective filter weights ( That is, the weight applied to the effective feature region , Instead of filling in zeros ) The number of will become smaller 
      - ![ Insert picture description here ](https://img-blog.csdnimg.cn/db9a43e97425443ab85e632c45b3547e.png#pic_center)
    
      -  When atrous The rate is small , all 9 All filter weights are applied to feature  map Most of the effective areas on , When atrous When the rate increases ,3×3filter Degenerate to a1×1filter, Because only the center weight is valid .
    
    - Atrous Space pyramid pool (ASPP)
    
      -  problem : In the application a3×3 atrous Convolution , Due to image boundary effect , Unable to capture remote information , Effectively and simply degenerate to1×1convolution
      -  programme : It is proposed to incorporate image level features into ASPP modular 
    
    -  The model uses a spatial pyramid pool to ** Capture context in multiple scopes **
    
    - >  Example 
      >
      > - ParseNet Image level features are used to obtain global context information .
      > - DeepLabv2 Put forward atrous Space pyramid pool (ASPP), among ** Parallel with different rates atrous Convolution layer captures multi-scale information **.
      > -  be based on LSTM To aggregate the global context 
    
  • The standard convolution is rater= 1 In special circumstances , and atrous Convolution allows us to adaptively modify the field of view of the filter by changing the rate value

    •  Insert picture description here

    • The introduced stride can make it easier for us to obtain long-distance information in deeper blocks

  • Training strategy

    • Learning rate adjustment strategy :

      • In this paper “ gather ” Learning rate strategy , Where the initial learning rate is multiplied by (1-iter/max_iter)**power. among power=0.9
    • Crop size(Caffe)

      • utilize crop_size In this way, you can clip center concerns and corner features ,mirror Can produce a mirror image , Make up for the shortage of small data sets .
      • In order to make large rate hole convolution effective , Need big Crop size; The filter weight with high filter rate is mostly applied to the filled zero region
    • Batch normalization

      • Batch of standardized , Similar to ordinary data standardization , Is a way to unify scattered data , It is also a method to optimize neural network .
      • Data with uniform specifications , It can make it easier for machine learning to learn the laws in the data .
    • Upsampling logits

      • Odds(A)= What happened A frequency / Number of other events ( It doesn't happen A The number of times )
      • probability P(A) and Odds(A) The range of values for is different .
      • Logit The decomposition of the word , For it (it)Log( Take the logarithm ), here “it” Namely Odds.
      • Usually , Let's start with Logit Transformation , Let's easily fit the data ( That's logical regression ), And then back to the familiar probability . That's the cycle , It provides convenience for data analysis . In a way , This kind of transformation , very Similar to the catalyst in chemistry .
    • Data augmentation

      • In this paper, the input image is randomly scaled ( from 0.5 To 2.0) And random left-right flipping to apply data enhancement .
  • The improvement mainly comes from in the model Add and fine tune batch normalization parameters , And better coding for multiscale environments .

  • To encode multiscale information , Proposed Cascaded modules gradually make atrous Rate doubling

  • Proposed atrous The spatial pyramid pooling module enhances image level features , Using filters to detect features at multiple sampling rates and effective fields of view

  • A. Effect of hyper-parameters

    • There are three main differences (77.21%, quantitative analysis )
      1. Bigger Corp size
        • If we use smaller crop size values , The performance is significantly reduced to 67.22%, It shows that the boundary effect caused by small crop size affects atrous Spatial Pyramid Pooling (ASPP) Large... Is used in the module atrous Rate of DeepLabv3 Performance of .
      2. Upsampling during training logits
        • If we don't sample up during training (instead downsample the groundtruths), Performance drops to 76.01%
      3. Fine tune batch normalization
        • When training DeepLabv3 When batch normalization is not adjusted , Performance drops to 75.95%
  • DeepLabv3 The salient point of :

    • The use of void convolution , This allows us to , It can acquire a larger receptive field to obtain multi-scale information .

    • Improved ASPP modular : from Hole convolution sum with different sampling rates BN layers , We try to Cascade or parallel Layout modules in a way .

      • BN layer

        • Batch Normalization: Solve during training , The problem of changing the data distribution in the middle layer , With Prevent the gradient from disappearing or exploding 、 Speed up your training

          • normalization (Normalized Data)( The reason why we need to do normalization preprocessing for all data ):
            • The essence of neural network learning process is to learn data distribution , Once the distribution of training data and test data is different , Then the generalization ability of the network is greatly reduced ;
            • Once the distribution of each batch of training data is different (batch gradient descent ), Then the network should learn to adapt to different distribution in each iteration , This will greatly reduce the speed of network training .
            • The training of deep network is a complex process , As long as the first few layers of the network change slightly , Then the later layers will be magnified cumulatively .
            • Once the distribution of input data at a certain layer of the network changes , So this layer of network needs to adapt to learning the new data distribution , So if during training , The distribution of training data has been changing , It will affect the training speed of the network .
        • BN Training

          • Random gradient descent method (SGD) It is simple and efficient for training depth network , We need to choose parameters artificially ( Learning rate 、 Parameter initialization 、 Weight attenuation coefficient 、Drop out Proportion, etc ), Use BN after , You don't need to adjust the parameters so deliberately .
          • Once the neural network is trained , Then the parameters will be updated , In addition to the input layer data ( Because input layer data , We have artificially normalized each sample ), The input data distribution of each layer of the network is Is changing all the time , Because in training , The update of the training parameters of the front layer will lead to the change of the input data distribution of the back layer . Put the network middle layer in the training process , A change in the distribution of data is called :“Internal Covariate Shift”
            • To solve the problem in the training process , The change of data distribution in the middle layer , So there was Batch Normalization, The birth of this algorithm .
          • BN The status of : With the activation function layer 、 Convolution layer 、 Fully connected layer 、 The pool layer is the same ,BN(Batch Normalization) It also belongs to the network .
          • BN The essence of the theory : When inputting at every layer of the network , Another Normalized layer , That is to do a normalization first ( Normalize to : mean value 0、 The variance of 1), Then go to the next layer of the network .
        • BN The role of

          1. Improve the gradient through the network
          2. Allow for a higher rate of learning , Speed up your training
          3. Reduce strong dependence on initialization
          4. Improve the regularization strategy : As a form of regularization , A slight decrease in the number of dropout The needs of
          5. The use of a local response normalization layer is no longer needed ( The local response normalization is Alexnet The way the Internet works , I'm familiar with visual estimation ), because BN It's a normalized network layer ;
            • Regularization (Regularization: A technique to avoid over fitting )
      •  The essence of neural network learning process is to : Learning data distribution , Once the distribution of training data and test data is different , Then the generalization ability of the network is greatly reduced , Therefore, the input data normalization method is required , Make the distribution of training data and test data the same .
        https://www.cnblogs.com/king-lps/p/8378561.html
        
  • Up sampling and down sampling

    • On the sampling : The simple understanding is to enlarge the picture . In the algorithm, , In the process of image recognition , The image needs to be classified at the pixel level , Therefore, after convolution to extract features, it is necessary to use up sampling to extract features feature map Restore To the original picture .

      • Almost all of the image zooming is done by interpolation , That is, on the basis of the original image pixels New elements are inserted between pixels using appropriate interpolation algorithm .
      • Common upsampling methods include bilinear interpolation and transpose convolution 、 On the sampling (unsampling) And upper pool (unpooling).
    • Down sampling :

      • background : Machine learning algorithm is to get some experience through calculation from a large number of data sets , And then determine whether some data are normal or not . however , Unbalanced data sets , Obviously, the number of minority classes is too small , The model will be more inclined to the majority set .

      • Common down sampling methods

        • Random down sampling

          • Select some samples randomly from most classes and eliminate them . The disadvantage of this method is that the rejected samples may contain some important information , As a result, the effect of the learned model is not good .
        • EasyEnsemble and BalanceCascade

          • EasyEnsemble and BalanceCascade Integrated learning mechanism is used to deal with the information loss in traditional random undersampling .
          • EasyEnsemble Take most of the class samples Randomly divided into n A subset of , The number of each subset is equal to the number of samples of a few classes , This is equivalent to under sampling . Then each subset is combined with a few samples to train a model , The final will be n Model integration , In this way, although the sample of each subset is less than the total sample , However, the total amount of information after integration does not decrease .
          • BalanceCascade It's using There is a combination of supervision Boosting The way (Boosting Method is a method used to improve the accuracy of weak classification algorithm , This is done by constructing a series of prediction functions , And then combine them into a prediction function in a certain way ).
            • In the n In round training , Combine the subset sampled from the majority class samples with the minority class samples to train a Basic learner H, After training, most classes can be H Correctly classified samples will be rejected .
            • In the next n+1 In the round , A subset is generated from the eliminated majority class samples for training with a few class samples , Last Integrate different base learners .
            • BalanceCascade The supervised performance of the base learner in each round plays the role of selecting samples in most classes , And its Boosting The characteristic is to discard the correctly classified samples in each round , Further, the follow-up base learners will pay more attention to the samples with previous classification errors .
        • NearMiss

          • NearMiss In essence, it's a prototype choice (prototype selection) Method , That is to say, the most representative samples are selected from most kinds of samples for training , Mainly to alleviate the problem of information loss in random undersampling .

          • NearMiss Use some heuristic rules to select samples , According to the different rules, it can be divided into 3 class :

            1. NearMiss-1: Choose the nearest K The average distance of a few samples from the nearest most samples
            2. NearMiss-2: Select the farthest K The average distance of a few samples from the nearest most samples
            3. NearMiss-3: Select... For each small sample K Most recent samples , The purpose is to ensure that every minority sample is surrounded by the majority sample
        • https://blog.csdn.net/weixin_44451032/article/details/99974665
          
  • Bilinear interpolation : Bilinear interpolation It's right linear interpolation In 2D Right angle grid Extension on , For bivariate functions ( for example x and y) Conduct interpolation . Its core idea is to do linear interpolation in two directions respectively .
  • Transposition convolution
    • One Convolution Operation is just a Many to one mapping .
    • We want to map a value in the input matrix to a value in the output matrix 9 It's worth , This will be One to many (one-to-many) The mapping relation of . This is like the reverse of the convolution operation , Its core idea is to use transpose convolution .
  • Global features or Context interaction It is helpful to correctly classify pixels for semantic segmentation

Its core idea is to do linear interpolation in two directions respectively .

  • Transposition convolution
    • One Convolution Operation is just a Many to one mapping .
    • We want to map a value in the input matrix to a value in the output matrix 9 It's worth , This will be One to many (one-to-many) The mapping relation of . This is like the reverse of the convolution operation , Its core idea is to use transpose convolution .
  • Global features or Context interaction It is helpful to correctly classify pixels for semantic segmentation
原网站

版权声明
本文为[Shameful child]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202280543149271.html