当前位置：网站首页>Semantic segmentation correlation

Semantic segmentation correlation

2022-07-29 04:14:00 【ytusdc】

Why should image segmentation in deep learning be encoded before decoding

Downsampling is a means, not an end ：

Reduce the amount of video memory and Computing , The picture is small, and the video memory is small , The amount of calculation is also small ;
Increase the receptive field , Use the same 3x3 The convolution can extract features in a larger image range . The large receptive field is important for segmentation , Small receptive fields cannot be divided into many categories , And the segmentation is very rough
Several more sub frontal sampling branches with different degrees , It can facilitate the fusion of multi-scale features . Multi level semantic fusion will make classification more accurate .

Theoretical significance of downsampling , I'll read it briefly , It can increase the robustness to some small disturbances of the input image , For example, image translation , Spin, etc , Reduce the risk of over fitting , Reduce the amount of computation , And increase the size of the receptive field . Related links ： Why should image segmentation in deep learning be encoded before decoding ？

1、 In panoramic segmentation stuff and things The difference between

Panoramic segmentation this year , End to end road

2、 Why should image segmentation in deep learning be encoded first (encode) Decode again (decode)？

Why should image segmentation in deep learning be encoded before decoding

3、BN Is it generally used in image segmentation ？

Commonly used Normalization Summary of methods ：BN、LN、IN、GN

4、 The result of segmentation is usually discontinuous , How to deal with ？

Morphological application —— Image open operation and close operation

Morphological application —— Principle of image corrosion and expansion

Open operation or close operation . Set threshold , Remove the connected set with small threshold , And smaller cavities .

Open operation = First corrode the calculation , Re expansion operation （ It seems to separate the two targets that are finely connected ）

Closed operation = Expand first , Then corrode the operation （ It seems to close two finely connected blocks together ）

5、FCN And CNN The biggest difference ？FCN Why does the network use full convolution layer instead of full connection layer ？

Fully convolutional network FCN Detailed explanation

Fully convolutional network （FCN） Detailed explanation

Fully convolutional network FCN Detailed explanation

Tradition CNN There are a few drawbacks ：

Big storage cost , The sliding window is large , Each window needs storage space to store features and identify categories , And use a fully connected structure , The last few layers of nearly exponential storage
Calculation efficiency is low , A lot of double counting
Sliding window size is relatively independent , Using a full connection at the end can only constrain local features .

FCN improvement ：

Convolution ： Fully connected layer （6,7,8） All become convolution , Adapt to any size input , Output low resolution split pictures .

deconvolution ： Using the characteristic diagram of the last convolution layer of the anti convolution layer （heatmap） Sample up , Restore it to the same size as the original drawing , In this way, the spatial information of the original image is preserved , Then, the feature map obtained by up sampling can be classified pixel by pixel , Thus, each pixel of the original image can be predicted , Finally, calculate pixel by pixel softmax Classified losses ;

Jump layer structure ： Combine the data after upper sampling and upper convolution pooling , Repair the restored image .

Full convolution replaces full connection

In order to solve some of the above problems ,FCN Will tradition CNN The full connection layer in is transformed into a convolution layer , Corresponding CNN The Internet FCN Convert the last three fully connected layers into three convolution layers （4096,4096,1000）.

Here we need to understand a more professional sentence ： If the convolution kernel kernel_size And the input feature maps Of size equally , Then it is equivalent to that the convolution kernel has calculated all feature maps Information about , Is equivalent to a kernel_size∗1 Full connection of .

How can we understand this sentence ？

It means ： When the size of the image we input is consistent with the size of the convolution kernel , In fact, it is equivalent to establishing a full connection , But there is a difference .

The fully connected structure is fixed , When we finish training, each connection is weighted . The convolution process is actually a training connection structure , Learned the relationship between the target and those pixels , Pixels with weak weight can be ignored .

Full connection will not learn filtering , It will give weight to each connection and will not modify the connection relationship . Convolution is about learning useful relationships , It will weaken or directly dropout. In this way, convolution blocks can share a set of weights , Reduce double counting , It can also reduce the complexity of the model .

FCN The image is actually classified at the pixel level , Treat each pixel as a training sample , Not only to predict its category , Also calculate its softmax Classified losses . This progress solves the problem of image segmentation at the semantic level .

FCN The shortcomings of ：

The result of segmentation is not fine enough . The image is too blurred or smooth , The details of the target image are not segmented
Because the model is based on CNN modified , Even if you replace the full connection with convolution , But still separate pixels for classification , The relationship between pixels is not fully considered

6、PSPNet

Thoroughly explain the pyramid scene and analyze the network （PSPNet）

PSPNet —— Semantic segmentation and scene analysis

PSP Net Paper notes

Ask questions

Most current models are based on FCN Of , There are many common problems ：

Mismatched Relationship： Lack of capture of context , This is necessary for understanding complex scenes . For example, the ship is wrongly divided into cars , It ignores the influence of background water ;
Confusion Categories： Many category labels are linked , And with some FCN-based The model does not make full use of this In panoramic segmentation stuff and things The difference between
Panoramic segmentation this year , End to end road Relationship . For example skycraper Part of the error is divided into building;
Inconspicuous Classes： Lack of ability to capture objects that are not obvious , For example, small or inconspicuous objects fused with other objects .
These problems are due to the model does not make full use of global information , But there are also several common ways to avoid ：
Global average pooling ： Lose spatial relationships and cause ambiguity , Cannot cover all necessary information
Pagoda pool ： Produce different levels of features , This method is also used for reference in the article

7、MASK-RCNN

Amazing Mask RCNN

Mask R-CNN Detailed explanation

Mask-RCNN Algorithm and implementation details

Paper notes ：Mask R-CNN

Semantic segmentation network :FCN,UNet,Mask-RCNN

U-Net Why does neural network perform well in medical image segmentation ？

combination MaskRCNN Network structure chart , Pay attention to the following points ：

1） Although I will ResNet The Internet is divided into 5 individual stage, however , Did not take advantage of Stage1 namely P1 Characteristics of , The official statement is because P1 Corresponding feature map It is relatively large and time-consuming, so it is discarded ; contrary , stay Stage5 namely P5 On the basis of the lower sampling, we get P6, so , Take advantage of [P2 P3 P4 P5 P6] Five characteristic maps of different scales are input into RPN The Internet , Generate... Separately RoI.

2）[P2 P3 P4 P5 P6] The characteristic maps of five different scales are RPN The network generates several anchor box, after NMS After the non maximum suppression operation, nearly a total of 2000 individual RoI（2000 Is a changeable parameter ）, Due to the step size stride Different , Separate and pair [P2 P3 P4 P5] Four different scales feature map Corresponding stride Conduct RoIAlign operation , Will be generated by this operation RoI Conduct Concat Connect , Then the network is divided into three parts ： Full connection prediction category class、 Full connection prediction rectangle box、 Full convolution prediction pixel segmentation mask

3） Loss function ： Classification error + Detection error + Segmentation error , namely L=Lcls+Lbox+Lmask

Lcls、Lbox： Use full connection to predict each RoI The category of and its rectangular box coordinate value , You can see FasterRCNN Introduction in the network .

Lmask：

① mask The branch adopts FCN For each RoI The segmentation output dimension of is K*m*m（ among ：m Express RoI Align The size of the feature graph ）, namely K Category m*m Binary value mask; keep m*m The spatial layout of the city ,pixel-to-pixel Operation needs to be guaranteed RoI features Alignment mapped to the original , This is also used RoIAlign Solve the causes of alignment problems , Reduce the error of pixel level alignment .

K*m*m binary mask Structural interpretation ： The final FCN Output one K Layer of mask, Each layer is a class ,Log Output , use 0.5 Binarization as a threshold , Generate segmentation of background and foreground Mask

such ,Lmask So that the network can output each kind of mask, And there will be no different categories mask The competition between . Classification network branch prediction object Category label , To select output mask, For each ROI, If detected ROI Which part does it belong to class , Only the relative entropy error of which branch is used as the error value for calculation .（ Illustrate with examples ： Classified as 3 class （ cat , Dog , people ）, Detect the current ROI Belong to “ people ” This kind of , Then the used Lmask by “ people ” Of this branch mask, namely , Every class The category corresponds to a mask Can effectively avoid competition between classes （ other class No contribution Loss）

② Apply to each pixel sigmoid, And then take RoI The average value of the cross entropy of all pixels on is taken as Lmask.

8、PAnet

【 Instance segmentation 】PANet Simple notes

PANet： Upgraded version Mask R-CNN

PANet Algorithm notes

CVPR 2018 PANet

9、Deeplab series

Google ——DeepLab v1

Intensive reading of in-depth study papers (20) DeepLab V1

DeepLab V2 Paper notes

DeepLab V3 Paper notes

Semantic segmentation model DeepLabv3+

DeepLab V3 Paper notes

10、 Introduce deeplabv3, Draw backbone

11、 sketch Deeplab v3 Compared with the previous v1 and v2 What are the improvements of the network

12、 deeplabv3 Loss function of

13、 Introduce pyramid pooling ,ASPP, Depth can be divided into , Convolution with holes , PSPNet in PSP

14、 Series and parallel ASPP All need to be drawn . The paper thinks which of these two methods is better ？

Parallel connection is better , Series connection will produce Griding Efect.
ask ： How to avoid Griding Efect-- Grid effect （ Checkerboard effect ）

Google brain ： The zoom CNN eliminate “ Chessboard effect ”, Improve the quality of neural network image generation

『 Computer vision 』 Chessboard effect

Void convolution and RFBNet-------- Grid problem

summary - Cavity convolution (Dilated/Atrous Convolution)（ Solution to void convolution checkerboard effect ）

15、 Semantic segmentation evaluation index miou, Give a brief account of mIOU, Write mIOU Calculation formula

Image segmentation - The evaluation index

object detection (IOU) + Semantic segmentation (mIOU) +NMS_ytusdc The blog of -CSDN Blog _ Semantic segmentation iou

16、 Experience of segmenting small targets

17、 Common loss functions of semantic segmentation Loss And advantages and disadvantages

【 Set of loss functions 】 In super detailed semantic segmentation Loss On a large scale

In image segmentation loss-- Dealing with extremely uneven data

原网站

版权声明
本文为[ytusdc]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/196/202207130552017409.html

当前位置：网站首页>Semantic segmentation correlation

Semantic segmentation correlation

Why should image segmentation in deep learning be encoded before decoding

1、 In panoramic segmentation stuff and things The difference between

2、 Why should image segmentation in deep learning be encoded first (encode) Decode again (decode)？

3、BN Is it generally used in image segmentation ？

4、 The result of segmentation is usually discontinuous , How to deal with ？

5、FCN And CNN The biggest difference ？FCN Why does the network use full convolution layer instead of full connection layer ？

6、PSPNet

7、MASK-RCNN

8、PAnet

9、Deeplab series

10、 Introduce deeplabv3, Draw backbone

11、 sketch Deeplab v3 Compared with the previous v1 and v2 What are the improvements of the network

12、 deeplabv3 Loss function of

13、 Introduce pyramid pooling ,ASPP, Depth can be divided into , Convolution with holes , PSPNet in PSP

14、 Series and parallel ASPP All need to be drawn . The paper thinks which of these two methods is better ？

15、 Semantic segmentation evaluation index miou, Give a brief account of mIOU, Write mIOU Calculation formula

16、 Experience of segmenting small targets

17、 Common loss functions of semantic segmentation Loss And advantages and disadvantages

边栏推荐

猜你喜欢

随机推荐