当前位置:网站首页>[reading papers] comparison of deeplobv1-v3 series, brief review
[reading papers] comparison of deeplobv1-v3 series, brief review
2022-06-13 02:24:00 【Shameful child】
Deeplab Series for Semantic segmentation task . For the task of semantic segmentation :
- Semantic segmentation is the process of image segmentation Intensive segmentation tasks , Division Each pixel to the specified category On ;
- Divide the image into several meaningful The goal of ;
- Assign the specified category label to the object .
In the traditional semantic segmentation problem , There are three challenges :
- Traditional classification CNN in Continuous pooling and downsampling lead to a decrease in spatial resolution .
- Because semantic segmentation is a pixel level classification , Highly abstract spatial features Yes low-level Not applicable , Therefore, we must consider feature map Dimensional and spatial invariance of .
- feature map Smaller because stride The existence of ,stride>1 To increase the sensitivity of the receptive field , If stride=1, Make sure you feel the same , Then the size of convolution kernel must be larger , therefore ,deeplabv1 The paper Use hole Algorithm to increase the nuclear size to achieve the same receptive field , That is, empty convolution .
- Image input CNN The latter is a process of gradual abstraction , The original Location information Meeting Decrease or even disappear with depth . Conditional random field is a smoothing method in traditional image processing , That is, when determining the pixel value of a position , Be able to consider the pixel values of surrounding neighbors , Eliminate some noise .
- The specific operation is : Remove the last two pooling layers of the original network , Use rate=2 Hole convolution sampling .
- Object to scale detection problem , Use to rescale and aggregate feature maps , But it's a lot of calculation .
- Object centered classification , We need to ensure the invariance of spatial transformation .
- Traditional classification CNN in Continuous pooling and downsampling lead to a decrease in spatial resolution .
link :https://www.jianshu.com/p/9184455a4bd3
ASPP structure
SPP:Spatial Pyramid Pooling- He Kaiming
- requirement : stay R-CNN China needs Fix the size of the input picture , Because the structure of the full connection layer behind the convolution layer is fixed .
- Live situation : But in reality , Our input image size is always Cannot meet the size required for input , However, the usual technique is cutting (crop) And stretching (warp), But it is not good to do so , It distorts the original features .
- programme :SPP Layers pass through The feature map of the candidate region is divided into several grids of different sizes , Then on Maximum pooling is done in each grid , In this way, the following full connection layer can still get fixed input .
- application :Fast-RCNN Medium ROI pooling Layer is actually a special spatial pyramid pooling, Their ideas are similar , It's just ROI pooling Pool only one size of grid , and spatial pyramid pooling At the same time, a variety of grid sizes are used .
PPM:Pyramid Pooling Module
- location : yes PSPNet A module proposed in ,PSPNET Is a network for semantic segmentation .
- problem : Generally speaking The deeper the network, the greater the feeling field (resceptive field), But there is still a gap between the receptive field in theory and that in the actual network ( The actual receptive field is smaller than the theoretical receptive field ), This makes the network unable to effectively fuse the global feature information
- Find out :GAP(Global Average Pooling, Global average pooling ) It can effectively fuse the global context information , But its ability of information fusion and extraction is limited , And simple to use GAP The information Compressing into one channel can easily lose a lot of useful information , So it will The feature fusion of different receptive fields and sub regions can enhance the ability of feature representation .
- programme :PPM modular , From the front network The extracted feature graph is divided into two branches , A branch is divided into several sub regions GAP( This is related to PSP The structure in the module is similar ), next use 1*1 To adjust the channel size by convolution , Again The size before pooling is obtained by bilinear interpolation , Finally, merge the two branches .
- application :PSP The module can aggregate the context of different regions to obtain the global context .
Cavity convolution (Atrous/Dilated Convolution)
You want to have a large receptive field for the features extracted from the picture , Too much computation
We want the resolution of the feature map not to drop too much ( Too much loss of resolution will lose a lot of detailed information about the image boundary )
The above two ideas are contradictory in practice , If you want to obtain a larger receptive field, you need to use a larger convolution kernel or a larger one when pooling strid, For the former, the amount of calculation is too large , The latter will lose resolution .
Void convolution is used to solve this contradiction . You can get Larger receptive field , And the resolution Don't lose too much .
The advantage of empty convolution is not to do pooling In case of loss of information , Increased receptive field , Let each convolution output contain a wide range of information .
In the field of image segmentation , Image input to CNN( Typical networks, such as FCN) in ,FCN First like Conventional CNN Then convolute the image and pooling, Reduce the image size and increase the receptive field , But because image segmentation prediction is pixel-wise Output , So will pooling Smaller image size after upsampling To the original image size for prediction , Previous pooling Operation makes every pixel Large receptive field information can be seen in prediction .
So image segmentation FCN There are two keys , One is pooling Reduce the image size and increase the receptive field , The other is upsampling Enlarge image size .
In the process of reducing and then increasing the size , Some information must have been lost , So can you design a new operation , Not through pooling You can also have a larger receptive field to see more information ? The answer is dilated conv.
dilated The advantage is not to do pooling In case of loss of information , Increased receptive field , Let each convolution output contain a wide range of information .
link :https://www.zhihu.com/question/54149221/answer/192025860
When performing a split task , Images There is a multi-scale problem , Big and small .
The common processing method is image pyramid , namely Put the original picture resize To different scales , Input to the same network , get Different feature map, Then do fusion , This method can really improve the accuracy , However, it brings Another problem is that the speed is too slow .
DeepLab v2 To solve this problem , Refer to the SPP、PPM Wait for the introduction of ASPP(atrous spatial pyramid pooling) modular , That is take feature map Through parallel cavity convolution layers with different expansion rates, multi-scale information is captured , And the output results are fused to get the segmentation results of the image .
stay deeplab v3 Improved in ASPP. One was used 1×1 Convolution sum of 3 individual 3×3 The void convolution of , Each convolution kernel has 256 All of them BN layer .
- 1×1 The convolution of is equivalent to rate Very large hole convolution , because rate The bigger it is , The smaller the effective parameter of the convolution kernel , This 1×1 The convolution kernel of is equivalent to large rate The parameter of the center of the convolution kernel .
deeplabv1
The basic network is VGG-16 Last change
problem reason Solution Down sampling results in loss of information maxpool cause feature map Size down , Details missing . The classification process is more " senior " Characteristics of , Segmentation is to classify each pixel , Need more details . Cavity convolution Space invariance The model is insensitive to the spatial position of the input image , No matter how the picture rotates , Translation, etc , Can recognize . For classification , This is a ok Of . But for segmentation , This is not OK 了 , After the picture is rotated , Of course, the classification of each pixel will change . Conditional random field CRF maxpool The meaning of : On the one hand, it's for Reduce feature map Size . On the one hand, it is also for increase feature map The receptive field of each element in .
FCN How to do : After getting the feature map , use deconv The way sampling will feature map Restore to original image size . But this kind of First max pool Down sampling and re sampling deconv The upsampling process is bound to lose some information .
deeplab solve FCN Problems in :deeplab Put forward a kind of dilated conv( Cavity convolution ) The convolution of , To replace max pool, But at the same time, the receptive field is not lost .
- dilated The advantage is not to do pooling In case of loss of information , Increased receptive field , Let each convolution output contain a wide range of information ., For image segmentation, this kind of spatial location , For tasks that are sensitive to details , Void convolution is a good way .
because CNN Natural space invariance , So for the split task ,CNN The extracted features are not fine enough . So after feature extraction , We will have a Add one more CRF, achieve More refined feature extraction purposes .
DeepLab V2
- stay v1 Based on the improvement
- The basic network is resnet Last change
- ASPP: Space pyramid . In order to use Solve multi-scale problems . That is, for the same object , Whether it is large or small in the image, it can be accurately identified .
DeepLab V3
- Removed crf.
- Transformed resnet, stay resnet Using hole convolution and pyramid hole convolution
- Using hole convolution , Remove the lower sample , such , Make sure to feel the wild and feature map Constant size
- feature map While the size of , It is equivalent to the loss of spatial information .
- Image segmentation is very sensitive to spatial information , So keep feature map Of size It's very important ,deeplab The design idea of the series is the same , That is to say Keep the receptive field , Keep at the same time feature map Of size unchanged
- By means of hole convolution , While maintaining the receptive field , It has not been reduced feature map The size of the .
- Use inside the residual block ASPP, Ensure multi-scale sensitivity
- stay block For internal use rate Different hole convolutions are convoluted in parallel , And then to get feature Do fusion .
- Using hole convolution , Remove the lower sample , such , Make sure to feel the wild and feature map Constant size
边栏推荐
- Huffman tree and its application
- Leetcode daily question - 890 Find and replace mode
- ROS learning-8 pit for custom action programming
- 在IDEA使用C3P0連接池連接SQL數據庫後卻不能顯示數據庫內容
- 1000 fans ~
- Chapter7-10_ Deep Learning for Question Answering (1/2)
- Deep learning the principle of armv8/armv9 cache
- Build MySQL environment under mac
- Sqlserver2008 denied select permission on object'***** '(database'*****', schema'dbo')
- 1、 Set up Django automation platform (realize one click SQL execution)
猜你喜欢
【Unity】打包WebGL项目遇到的问题及解决记录
ROS learning-7 error in custom message or service reference header file
1、 Set up Django automation platform (realize one click SQL execution)
4.11 introduction to firmware image package
Jump model between mirrors
What are the differences in cache/tlb?
Build MySQL environment under mac
智能安全配电装置如何减少电气火灾事故的发生?
How to solve the problem of obtaining the time through new date() and writing out the difference of 8 hours between the database and the current time [valid through personal test]
Yovo3 and yovo3 tiny structure diagram
随机推荐
Mbedtls migration experience
redis
Priority queue with dynamically changing priority
在IDEA使用C3P0連接池連接SQL數據庫後卻不能顯示數據庫內容
[unity] problems encountered in packaging webgl project and their solutions
SQL Server 删除数据库所有表和所有存储过程
柏瑞凯电子冲刺科创板:拟募资3.6亿 汪斌华夫妇为大股东
[pytorch]fixmatch code explanation (super detailed)
[pytorch]fixmatch code explanation - data loading
Huawei equipment is configured with IP and virtual private network hybrid FRR
Introduction to easydl object detection port
[programming idea] communication interface of data transmission and decoupling design of communication protocol
Chapter7-12_ Controllable Chatbot
I didn't expect that the index occupies several times as much space as the data MySQL queries the space occupied by each table in the database, and the space occupied by data and indexes. It is used i
Leetcode 450. 删除二叉搜索树中的节点 [二叉搜索树]
在IDEA使用C3P0连接池连接SQL数据库后却不能显示数据库内容
[pytorch] kaggle large image dataset data analysis + visualization
哈夫曼树及其应用
[work with notes] MFC solves the problem that pressing ESC and enter will automatically exit
【 unity】 Problems Encountered in Packaging webgl Project and their resolution Records