当前位置:网站首页>FCN: Fully Convolutional Networks for Semantic Segmentation
FCN: Fully Convolutional Networks for Semantic Segmentation
2022-07-05 18:23:00 【00000cj】
paper: Fully Convolutional Networks for Semantic Segmentation
Innovation points
The structure of full convolution is proposed , That is, the last full connection layer of the classified network is replaced by the convolution layer , Thus, the input of any size can be processed .
Up sampling by deconvolution or interpolation , Restore the output back to the original input size .
Modify on the classification network , Replace the full connection layer with the convolution layer , You can share the weight of the previous layer , So as to carry out finetune.
Put forward skip structure , By combining shallow and deep features , It takes into account the shallow spatial details and deep semantic information , Make the final segmentation result more refined .
Implementation details analysis
Here we use MMSegmentation As an example , Compared with the original paper ,backbone from Vgg-16 Instead of ResNet-50,skip The structure is replaced by expansion convolution ,pytorch The official implementation is also like this .
Backbone
- The original ResNet-50 in 4 individual stage Of strides=(1, 2, 2, 2), Do not use expansion convolution, that is dilations=(1, 1, 1, 1), And in the FCN in 4 individual stage Of strides=(1, 2, 1, 1),dilations=(1, 1, 2, 4).
- There's another one contract_dilation=True Set up , That is, when the hole >1 when , Compress the first convolution . Here are the third and fourth stage One of the first bottleneck Halve the expansion rate , The third stage One of the first bottleneck Expansion convolution is not used in , The fourth one stage One of the first bottleneck in dilation=4/2=2.
- In addition, here we use ResNetV1c, namely stem Medium 7x7 Convolution is replaced by 3 individual 3x3 Convolution .
- Last , Pay attention to the padding, In the original implementation, except stem in 7x7 Convolution padding=3, Everything else padding=1. stay FCN Because of the expansion convolution , The latter two stage Of stride=1, In order to keep the input and output resolution always , From the following formula padding=dilation.
- hypothesis batch_size=4, Model input shape=(4, 3, 480, 480), be backbone four stage The outputs of are (4, 256, 120, 120)、(4, 512, 60, 60)、(4, 1024, 60, 60)、(4, 2048, 60, 60).
FCN Head
- take ResNet The fourth one stage Output (4, 2048, 60, 60), after Conv2d(2048, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)、Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) Two conv-bn-relu obtain (4, 512, 60, 60).
- The output of the previous step (4, 512, 60, 60) With the input (4, 2048, 60, 60) Spliced to get (4, 2560, 60, 60).
- Through a conv-bn-relu,Conv2d(2560, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False), obtain (4, 512, 60, 60).
- use dropout,dropout_ratio=0.1.
- Last , after Conv2d(512, num_classes, kernel_size=(1, 1), stride=(1, 1)) Get the final output of the model (4, num_classes, 60, 60), Note that the number of categories here includes the background .
Loss
- The output of the previous step (4, 2, 60, 60) After bilinear interpolation resize Input size , obtain (4, 2, 480, 480).
- use CrossEntropy loss
Auxiliary Head
- take ResNet Third stage Output (4, 1024, 60, 60), after Conv2d(1024, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) One conv-bn-relu obtain (4, 256, 60, 60).
- use dropout,dropout_ratio=0.1.
- after Conv2d(256, num_classes, kernel_size=(1, 1), stride=(1, 1)) Get the final output of the model (4, num_classes, 60, 60) Get the output of this branch .
边栏推荐
- sample_rate(采样率),sample(采样),duration(时长)是什么关系
- JDBC reads a large amount of data, resulting in memory overflow
- 数值计算方法 Chapter8. 常微分方程的数值解
- What is the reason why the video cannot be played normally after the easycvr access device turns on the audio?
- Copy the linked list with random pointer in the "Li Kou brush question plan"
- GIMP 2.10教程「建议收藏」
- Problems encountered in the project u-parse component rendering problems
- Writing writing writing
- Wu Enda team 2022 machine learning course, coming
- 【PaddlePaddle】 PaddleDetection 人脸识别 自定义数据集
猜你喜欢
图片数据不够?我做了一个免费的图像增强软件
吴恩达团队2022机器学习课程,来啦
Find the first k small element select_ k
Failed to virtualize table with JMeter
Sophon base 3.1 launched mlops function to provide wings for the operation of enterprise AI capabilities
让更多港澳青年了解南沙特色文创产品!“南沙麒麟”正式亮相
记录Pytorch中的eval()和no_grad()
The 2022 China Xinchuang Ecological Market Research and model selection evaluation report released that Huayun data was selected as the mainstream manufacturer of Xinchuang IT infrastructure!
To solve the stubborn problem of Lake + warehouse hybrid architecture, xinghuan Technology launched an independent and controllable cloud native Lake warehouse integrated platform
JVM third talk -- JVM performance tuning practice and high-frequency interview question record
随机推荐
Le cours d'apprentissage de la machine 2022 de l'équipe Wunda arrive.
英语句式参考
Find the first k small element select_ k
Access the database and use redis as the cache of MySQL (a combination of redis and MySQL)
About statistical power
Numerical calculation method chapter8 Numerical solutions of ordinary differential equations
About Statistical Power(统计功效)
记录Pytorch中的eval()和no_grad()
The 10th global Cloud Computing Conference | Huayun data won the "special contribution award for the 10th anniversary of 2013-2022"
JVM third talk -- JVM performance tuning practice and high-frequency interview question record
模拟百囚徒问题
Leetcode notes: Weekly contest 300
The easycvr platform reports an error "ID cannot be empty" through the interface editing channel. What is the reason?
Einstein sum einsum
rust统计文件中单词出现的次数
隐私计算助力数据的安全流通与共享
数值计算方法 Chapter8. 常微分方程的数值解
Memory management chapter of Kobayashi coding
Privacy computing helps secure data circulation and sharing
Trust counts the number of occurrences of words in the file