当前位置:网站首页>Pspnet | semantic segmentation and scene analysis
Pspnet | semantic segmentation and scene analysis
2022-07-05 16:25:00 【Xiaobai learns vision】
Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”
Heavy dry goods , First time delivery This time , By the Chinese University of Hong Kong (CUHK) And Technology Shang Dynasty (SenseTime) Proposed pyramid scene parsing network (Pyramid Scene Parsing Network, PSPNet) Has been reviewed .
The goal of semantic segmentation is only to know the category label of each pixel of the known object .
Scene parsing is based on semantic segmentation , Its goal is to know the category labels of all pixels in the image .

Scene analysis
By using pyramid pooling modules (Pyramid Pooling Module), After integrating the context based on different regions ,PSPNet In effect, it exceeds FCN、DeepLab and DilatedNet The best way to wait .PSPNet Final :
get 2016 year ImageNet Champion of scene analysis challenge
stay PASCAL VOC 2012 and Cityscapes Get the best results at that time on the data set
The work has been published in 2017 year CVPR, More than 600 Time .(SH Tsang @ Medium )
Outline of this article
1. The need for global information
2. Pyramid pool module
3. Some details
4. Model simplification research
5. Comparison with the best method nowadays
1. The need for global information

(c) Original without context integration FCN,(d) Context integrated PSPNet
Relationship mismatch :FCN Predict the ship in the yellow box as “ automobile ”. But according to common sense , Cars rarely appear on the river .
Category confusion :FCN Predict a part of the objects in the box as “ skyscraper ”, Part of the prediction is “ building ”. These results should be excluded , In this way, the object as a whole will be divided into “ skyscraper ” or “ building ” In one category , It will not fall into two categories .
Categories of small objects : The appearance of pillows and sheets is similar . Ignoring the global scene category may cause parsing “ Pillow ” A kind of failure .
therefore , We need some global features of the image .
2. Pyramid pool module

Pyramid pool module after feature extraction ( Color is very important in this picture !)
(a) and (b)
(a) Input image for one of our .(b) Adopt extended network strategy (DeepLab / DilatedNet) The extracted features . stay DeepLab Then add extended convolution . features map The size of is the size of the image input here 1/8.
(C).1
stay (c) It's about , For each feature map Perform sub area average pooling .
Red : This is in every feature map The coarsest level of the global average pool on , Used to generate a single bin Output .
Orange : This is the second floor , Will feature map Divided into 2×2 Subarea , Then average pool each sub area .
Blue : This is the third level , Will feature map Divided into 3×3 Subarea , Then average pool each sub area .
green : This will feature map Divided into 6×6 The smallest level of the sub region , Then pool each sub region .
(c).2. 1×1 Convolution is used for dimensionality reduction
Then for each feature obtained map Conduct 1×1 Convolution , If the pyramid's hierarchical size is N, Reduce the context representation to the original 1/N( black ).
In this case ,N=4, Because there are 4 A level ( Red 、 Orange 、 Blue and green ).
If you enter a feature map The quantity of is 2048, Then output features map by (1/4)×2048 = 512, That is, output characteristics map The quantity of is 512.
(c).3. Bilinear interpolation is used for up sampling
Bilinear interpolation is used for each low dimensional feature map Sample up , Make it with original characteristics map The same size ( black ).
(c).4. Connection context aggregation features
All upsampling features at different levels map All with original features map( black ) come together . These feature maps are fused into a global a priori . This is the pyramid pool module (c) Termination of .
(d)
Last , The final predicted segmentation map is generated through the convolution layer (d).
The concept of sub area average pool is actually similar to SPPNet The spatial pyramid pooling in is very similar . First use 1×1 Convolution and then concatenation , And Xception or MobileNetV1 The depth convolution in the depth separable convolution used is very similar , Except that only bilinear interpolation is used to make all features map They are the same size .
3. Some training details

Auxiliary loss items in the middle
· Auxiliary loss items are used in the training process . Auxiliary loss items include 0.4 The weight of , To balance the final loss and auxiliary loss . At testing time , Will give up the auxiliary loss . This is a deep supervision training strategy for deep network training . The idea is similar to GoogLeNet / Inception-v1 Auxiliary classifier in (https://medium.com/coinmonks/paper-review-of-googlenet-inception-v1-winner-of-ilsvlc-2014-image-classification-c2b3565a64e7).
· “ multivariate ” Learning has replaced “ unit ” Study .
4. Model reduction test
ADE2K The dataset is ImageNet Scene analysis challenge 2016 Data set in . It is a more challenging data set , Contains up to 150 Class and 1,038 Image level labels . Yes 20K/2K/3K Images are used for training / verification / test .
Validation sets are used for model simplification testing .
4.1. Maximum pooling vs The average pooling , And dimensionality reduction (DR)

Different algorithms in ADE2K Verify the results on the set
ResNet50-Baseline: be based on ResNet50 Expansion FCN.
‘B1’ and ‘B1236’: bin The sizes are {1×1} and {1×1,2×2,3×3,6×6} Pooling characteristics of map.
‘MAX’ and ‘AVE’: Maximum pool operation and average pool operation
‘DR’: Dimension reduction .
The average pool always has better results . Using dimension reduction is better than not using dimension reduction .
4.2 Ancillary loss

The different weights of auxiliary loss items are ADE2K Verify the results on the set
α= 0.4 Get the best performance . therefore , Use weights α= 0.4.
4.3. Different network layers and different scales (MS) Test of

Networks of different layers and scales are ADE2K Verify the results on the set
As we know , Deeper models have better results . Multiscale testing helps to improve test results .
4.4. Data to enhance (DA) And the comparison with other algorithms

stay ADE2K Comparison results with the latest methods on the validation set ( Except for the last line , All methods are single scale ).
ResNet269+DA+AL+PSP: For a single scale test , If all the skills are combined , This algorithm has great advantages over the most advanced methods .
ResNet269+DA+AL+PSP+MS: At the same time, multi-scale tests were carried out , Good results have been achieved .
Here are some examples :

ADE2K Examples in
5. Comparison with the most advanced methods
5.1. ADE2K - ImageNet Scene analysis challenge 2016

ADE2K Test set results
PSPNet Won 2016 year ImageNet Champion of scene analysis challenge .
5.2. PASCAL VOC 2012
In the case of using data enhancement , Yes 10582/1449/1456 Images for training / verification / test .

PASCAL VOC 2012 Test set results
“+” Indicates that the model passes MS COCO Data pre training .
Again ,PSPNet Superior to all the most advanced methods , Such as FCN、DeconvNet、DeepLab and Dilation8.
Here are some examples :

PASCAL VOC 2012 Examples
5.3. Cityscapes
This dataset contains data from 50 Cities in different seasons 5000 A high-quality pixel level fine annotation image . There were 2975/500/1525 Images for training / verification / test . It defines the 19 Categories . Besides , We also provide 20000 Compare the roughly annotated image , namely , Use only fine data and both fine and coarse annotation data for training . Both training uses “++” Mark .

Cityscapes Test set results
Training with fine labeled data , Or use fine data and rough labeled data to train at the same time ,PSPNet Have achieved good results .
Here are some examples :

Cityscapes Examples

The author also uploaded Cityscapes Video of dataset , Very impressive :
Two other video examples :
https://www.youtube.com/watch?v=gdAVqJn_J2M
https://www.youtube.com/watch?v=HYghTzmbv6Q
Pyramid pool module is adopted , Get the global information of the image , Improved results .
The good news !
Xiaobai learns visual knowledge about the planet
Open to the outside world

download 1:OpenCV-Contrib Chinese version of extension module
stay 「 Xiaobai studies vision 」 Official account back office reply : Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .
download 2:Python Visual combat project 52 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .
download 3:OpenCV Actual project 20 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .
Communication group
Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition ( It will be subdivided gradually in the future ), Please scan the following micro signal clustering , remarks :” nickname + School / company + Research direction “, for example :” Zhang San + Shanghai Jiaotong University + Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~边栏推荐
- Background system sending verification code function
- Record the pits encountered in the raspberry pie construction environment...
- 漫画:什么是服务熔断?
- 【网易云信】超分辨率技术在实时音视频领域的研究与实践
- ES6 drill down - ES6 generator function
- List de duplication and count the number
- 一些认知的思考
- Batch update in the project
- Intel 13th generation Raptor Lake processor information exposure: more cores, larger cache
- Modify PyUnit_ Time makes it support the time text of 'xx~xx months'
猜你喜欢
![21. [STM32] I don't understand the I2C protocol. Dig deep into the sequence diagram to help you write the underlying driver](/img/f4/2c935dd9933f5cd4324c29c41ab221.png)
21. [STM32] I don't understand the I2C protocol. Dig deep into the sequence diagram to help you write the underlying driver

Background system sending verification code function

项目中批量update

数据湖(十四):Spark与Iceberg整合查询操作

【深度学习】深度学习如何影响运筹学?

Explain in detail the functions and underlying implementation logic of the groups sets statement in SQL

HiEngine:可媲美本地的云原生内存数据库引擎

Replknet: it's not that large convolution is bad, but that convolution is not large enough. 31x31 convolution. Let's have a look at | CVPR 2022

CISP-PTE之SQL注入(二次注入的应用)

服务器的数据库连不上了2003,10060“Unknown error“【服务已起、防火墙已关、端口已开、netlent 端口不通】
随机推荐
Batch update in the project
Seaborn draws 11 histograms
sql中set标签的使用
Six common transaction solutions, you sing, I come on stage (no best, only better)
Use of RLOCK lock
Intel 13th generation Raptor Lake processor information exposure: more cores, larger cache
Codasip adds verify safe startup function to risc-v processor series
abstract关键字和哪些关键字会发生冲突呢
16. [stm32] starting from the principle, I will show you the DS18B20 temperature sensor - four digit digital tube displays the temperature
[vulnerability warning] cve-2022-26134 conflict Remote Code Execution Vulnerability POC verification and repair process
Enterprise backup software Veritas NetBackup (NBU) 8.1.1 installation and deployment of server
How to use FRP intranet penetration +teamviewer to quickly connect to the intranet host at home when mobile office
用键盘输入一条命令
PSPNet | 语义分割及场景分析
Record the pits encountered in the raspberry pie construction environment...
Li Kou today's question -729 My schedule I
ES6 deep - ES6 class class
Using graylog alarm function to realize the regular work reminder of nail group robots
《21天精通TypeScript-3》-安装搭建TypeScript开发环境.md
Transaction rollback exception