当前位置:网站首页>Semantic segmentation correlation
Semantic segmentation correlation
2022-07-29 04:14:00 【ytusdc】
Why should image segmentation in deep learning be encoded before decoding
Downsampling is a means, not an end :
Reduce the amount of video memory and Computing , The picture is small, and the video memory is small , The amount of calculation is also small ;
Increase the receptive field , Use the same 3x3 The convolution can extract features in a larger image range . The large receptive field is important for segmentation , Small receptive fields cannot be divided into many categories , And the segmentation is very rough
Several more sub frontal sampling branches with different degrees , It can facilitate the fusion of multi-scale features . Multi level semantic fusion will make classification more accurate .
Theoretical significance of downsampling , I'll read it briefly , It can increase the robustness to some small disturbances of the input image , For example, image translation , Spin, etc , Reduce the risk of over fitting , Reduce the amount of computation , And increase the size of the receptive field . Related links : Why should image segmentation in deep learning be encoded before decoding ?
1、 In panoramic segmentation stuff and things The difference between
Panoramic segmentation this year , End to end road
2、 Why should image segmentation in deep learning be encoded first (encode) Decode again (decode)?
Why should image segmentation in deep learning be encoded before decoding
3、BN Is it generally used in image segmentation ?
Commonly used Normalization Summary of methods :BN、LN、IN、GN
4、 The result of segmentation is usually discontinuous , How to deal with ?
Morphological application —— Image open operation and close operation
Morphological application —— Principle of image corrosion and expansion
Open operation or close operation . Set threshold , Remove the connected set with small threshold , And smaller cavities .
Open operation = First corrode the calculation , Re expansion operation ( It seems to separate the two targets that are finely connected )
Closed operation = Expand first , Then corrode the operation ( It seems to close two finely connected blocks together )
5、FCN And CNN The biggest difference ?FCN Why does the network use full convolution layer instead of full connection layer ?
Fully convolutional network FCN Detailed explanation
Fully convolutional network (FCN) Detailed explanation
Fully convolutional network FCN Detailed explanation
Tradition CNN There are a few drawbacks :
- Big storage cost , The sliding window is large , Each window needs storage space to store features and identify categories , And use a fully connected structure , The last few layers of nearly exponential storage
- Calculation efficiency is low , A lot of double counting
- Sliding window size is relatively independent , Using a full connection at the end can only constrain local features .
FCN improvement :
Convolution : Fully connected layer (6,7,8) All become convolution , Adapt to any size input , Output low resolution split pictures .
deconvolution : Using the characteristic diagram of the last convolution layer of the anti convolution layer (heatmap) Sample up , Restore it to the same size as the original drawing , In this way, the spatial information of the original image is preserved , Then, the feature map obtained by up sampling can be classified pixel by pixel , Thus, each pixel of the original image can be predicted , Finally, calculate pixel by pixel softmax Classified losses ;
Jump layer structure : Combine the data after upper sampling and upper convolution pooling , Repair the restored image .
Full convolution replaces full connection
In order to solve some of the above problems ,FCN Will tradition CNN The full connection layer in is transformed into a convolution layer , Corresponding CNN The Internet FCN Convert the last three fully connected layers into three convolution layers (4096,4096,1000).
Here we need to understand a more professional sentence : If the convolution kernel kernel_size And the input feature maps Of size equally , Then it is equivalent to that the convolution kernel has calculated all feature maps Information about , Is equivalent to a kernel_size∗1 Full connection of .
How can we understand this sentence ?
It means : When the size of the image we input is consistent with the size of the convolution kernel , In fact, it is equivalent to establishing a full connection , But there is a difference .
The fully connected structure is fixed , When we finish training, each connection is weighted . The convolution process is actually a training connection structure , Learned the relationship between the target and those pixels , Pixels with weak weight can be ignored .
Full connection will not learn filtering , It will give weight to each connection and will not modify the connection relationship . Convolution is about learning useful relationships , It will weaken or directly dropout. In this way, convolution blocks can share a set of weights , Reduce double counting , It can also reduce the complexity of the model .
FCN The image is actually classified at the pixel level , Treat each pixel as a training sample , Not only to predict its category , Also calculate its softmax Classified losses . This progress solves the problem of image segmentation at the semantic level .
FCN The shortcomings of :
- The result of segmentation is not fine enough . The image is too blurred or smooth , The details of the target image are not segmented
- Because the model is based on CNN modified , Even if you replace the full connection with convolution , But still separate pixels for classification , The relationship between pixels is not fully considered
6、PSPNet
Thoroughly explain the pyramid scene and analyze the network (PSPNet)
PSPNet —— Semantic segmentation and scene analysis
Ask questions
Most current models are based on FCN Of , There are many common problems :
- Mismatched Relationship: Lack of capture of context , This is necessary for understanding complex scenes . For example, the ship is wrongly divided into cars , It ignores the influence of background water ;
- Confusion Categories: Many category labels are linked , And with some FCN-based The model does not make full use of this In panoramic segmentation stuff and things The difference between
Panoramic segmentation this year , End to end road Relationship . For example skycraper Part of the error is divided into building; - Inconspicuous Classes: Lack of ability to capture objects that are not obvious , For example, small or inconspicuous objects fused with other objects .
These problems are due to the model does not make full use of global information , But there are also several common ways to avoid : - Global average pooling : Lose spatial relationships and cause ambiguity , Cannot cover all necessary information
- Pagoda pool : Produce different levels of features , This method is also used for reference in the article
7、MASK-RCNN
Mask R-CNN Detailed explanation
Mask-RCNN Algorithm and implementation details
Semantic segmentation network :FCN,UNet,Mask-RCNN
U-Net Why does neural network perform well in medical image segmentation ?
combination MaskRCNN Network structure chart , Pay attention to the following points :
1) Although I will ResNet The Internet is divided into 5 individual stage, however , Did not take advantage of Stage1 namely P1 Characteristics of , The official statement is because P1 Corresponding feature map It is relatively large and time-consuming, so it is discarded ; contrary , stay Stage5 namely P5 On the basis of the lower sampling, we get P6, so , Take advantage of [P2 P3 P4 P5 P6] Five characteristic maps of different scales are input into RPN The Internet , Generate... Separately RoI.
2)[P2 P3 P4 P5 P6] The characteristic maps of five different scales are RPN The network generates several anchor box, after NMS After the non maximum suppression operation, nearly a total of 2000 individual RoI(2000 Is a changeable parameter ), Due to the step size stride Different , Separate and pair [P2 P3 P4 P5] Four different scales feature map Corresponding stride Conduct RoIAlign operation , Will be generated by this operation RoI Conduct Concat Connect , Then the network is divided into three parts : Full connection prediction category class、 Full connection prediction rectangle box、 Full convolution prediction pixel segmentation mask
3) Loss function : Classification error + Detection error + Segmentation error , namely L=Lcls+Lbox+Lmask
Lcls、Lbox: Use full connection to predict each RoI The category of and its rectangular box coordinate value , You can see FasterRCNN Introduction in the network .
Lmask:
① mask The branch adopts FCN For each RoI The segmentation output dimension of is K*m*m( among :m Express RoI Align The size of the feature graph ), namely K Category m*m Binary value mask; keep m*m The spatial layout of the city ,pixel-to-pixel Operation needs to be guaranteed RoI features Alignment mapped to the original , This is also used RoIAlign Solve the causes of alignment problems , Reduce the error of pixel level alignment .
K*m*m binary mask Structural interpretation : The final FCN Output one K Layer of mask, Each layer is a class ,Log Output , use 0.5 Binarization as a threshold , Generate segmentation of background and foreground Mask
such ,Lmask So that the network can output each kind of mask, And there will be no different categories mask The competition between . Classification network branch prediction object Category label , To select output mask, For each ROI, If detected ROI Which part does it belong to class , Only the relative entropy error of which branch is used as the error value for calculation .( Illustrate with examples : Classified as 3 class ( cat , Dog , people ), Detect the current ROI Belong to “ people ” This kind of , Then the used Lmask by “ people ” Of this branch mask, namely , Every class The category corresponds to a mask Can effectively avoid competition between classes ( other class No contribution Loss)
② Apply to each pixel sigmoid, And then take RoI The average value of the cross entropy of all pixels on is taken as Lmask.
8、PAnet
【 Instance segmentation 】PANet Simple notes
PANet: Upgraded version Mask R-CNN
9、Deeplab series
Intensive reading of in-depth study papers (20) DeepLab V1
Semantic segmentation model DeepLabv3+
10、 Introduce deeplabv3, Draw backbone
11、 sketch Deeplab v3 Compared with the previous v1 and v2 What are the improvements of the network
12、 deeplabv3 Loss function of
13、 Introduce pyramid pooling ,ASPP, Depth can be divided into , Convolution with holes , PSPNet in PSP
14、 Series and parallel ASPP All need to be drawn . The paper thinks which of these two methods is better ?
Parallel connection is better , Series connection will produce Griding Efect.
ask : How to avoid Griding Efect-- Grid effect ( Checkerboard effect )
『 Computer vision 』 Chessboard effect
Void convolution and RFBNet-------- Grid problem
15、 Semantic segmentation evaluation index miou, Give a brief account of mIOU, Write mIOU Calculation formula
Image segmentation - The evaluation index
16、 Experience of segmenting small targets
17、 Common loss functions of semantic segmentation Loss And advantages and disadvantages
【 Set of loss functions 】 In super detailed semantic segmentation Loss On a large scale
In image segmentation loss-- Dealing with extremely uneven data
边栏推荐
- 请问为什么我进行mysql数据update时,kafka中采集到的是先删除原纪录(op d)再新增新
- The function "postgis_version" cannot be found when installing PostGIS
- kotlin的List,Map,Set等集合类不指定类型
- Methods of using multiple deformations on an element
- Installation and use of stm32cubemx (5.3.0)
- pat A1041 Be Unique
- 对一个元素使用多种变形的方法
- The structure pointer must be initialized, and the pointer must also be initialized
- Pat a1069/b1019 the black hole of numbers
- 顺序表和链表
猜你喜欢
不会就坚持58天吧 实现前缀树
How to solve the problem of store ranking?
LDP -- label distribution protocol
Machine vision Series 2: vs DLL debugging
[paper translation] vectornet: encoding HD maps and agent dynamics from vectorized representation
MPU6050
店铺排名问题,如何解决?
SVG--loading动画
Object detection: object_ Detection API +ssd target detection model
Blood cases caused by < meta charset=UTF-8> -- Analysis of common character codes
随机推荐
Jenkins 参数化构建中 各参数介绍与示例
不会就坚持60天吧 神奇的字典
Problems encountered in vscode connection SSH
opengauss预检查安装
Pat a1069/b1019 the black hole of numbers
pat A1041 Be Unique
The structure pointer must be initialized, and the pointer must also be initialized
索引的最左前缀原理
请问,在sql client中,执行insert into select from job时,如何单
2021 sist summer camp experience + record post of School of information, Shanghai University of science and technology
Pix2.4.8 from start to installation (2021.4.4)
Machine vision Series 2: vs DLL debugging
How to set the SQL execution timeout for flick SQL
Press the missing number of interview question 17.04 | | 260. the number that appears only once (including bit operation knowledge points)
Multi rotor six axis hardware selection
信号处理中的反傅里叶变换(IFFT)原理
C language: getchar () and cache
不会就坚持71天吧 链表排序
What the hell is this error? It doesn't affect the execution result, but it always reports errors when executing SQL... Connecting maxcomputer uses
基于STM32和阿里云的环境检测系统设计