当前位置:网站首页>Semantic segmentation | learning record (3) FCN
Semantic segmentation | learning record (3) FCN
2022-07-08 02:09:00 【coder_ sure】
Tips : come from up Lord thunderbolt Wz, I'm just taking study notes
List of articles
Preface
Fully Convolutional Networks for Semantic Segmentation(FNC) The network is published in 2015CVPR An article on . Interested readers can click the link to download by themselves .
One 、Fully Convolutional Network
FCN(Fully Convolutional Network) yes First end-to-end
For Pixel level prediction
Of Fully convolutional network
. The full convolution in this article means that the full connection layer in the classification network in the whole network is replaced by the convolution layer . This is a leading article in the field of semantic segmentation .
Let's take a look FCN The effect of semantic segmentation of the network :
We can see FCN-8s The semantic segmentation effect of has been compared with Ground truth Very close , It can be explained here FCN The effect of the network is still very good ( At least in that year, the effect was good ).
The author will also FCN Compared with the mainstream algorithm of that year , stay mean IoU And reasoning time can sling opponents .
Let's take a look at the article FCN-32s Network structure , You can also find that its network structure is very simple , But in fact, the semantic segmentation structure is very good , This is also FCN The excellence of .
By the way , We can observe through a series of convolutions 、 Down sampling , The last characteristic layer is observed channel=21
, This is because the data set used is PASCAL VOC(20 Category + background ), Then, we can get the same size feature map as the original image by up sampling (channel=21), For each pixel 21 Value for softmax Handle , The prediction probability of the pixel for each category is obtained , Take the one with the greatest probability as the prediction category .
Two 、Convolutionalization
We know that the full connection layer requires us The pixels of the input picture are fixed
. We need to fix the image input when training the classification network .
So if the The whole connection layer is changed into convolution layer
, Then there is no limit to the size of the input image .
Here we will do the whole connection layer Convolutionalization Handle , As shown in the figure below , This is a picture of any size that can be input . But if the input image is Greater than 224*224
Words , Then the height and width of the last feature layer are greater than 1 了 , This corresponds to every last channel The data of is a 2 D data
. Then we can develop it into a plane visualization .
3、 ... and 、Convolutionalization Process principle
We use VGG16 Take the Internet for example :
- take 7 * 7 * 512 The characteristic layer of flatten And a set of lengths after full connection operation 4096 The layer ,FC1 The parameter is 102760448
- take 7 * 7 * 512 Through conv(7*7,s1,4096) After the operation ,conv The parameter quantity is also 102760448
From the above description, we can know that the two processes are completely equivalent , This is the same. Convolutionalization The process of .
Four 、FCN Model details
Given in the original paper FCN-32s-fixed,FCN-32s,FCN-16s,FCN-8s Comparison list of various performance indicators , We can also find that the effect is getting better .
What exactly are these models ?
- FCN-32s: Is to sample the prediction results 32 times , Restore to the original size .
- FCN-16s: Is to sample the prediction results 16 times , Restore to the original size .
- FCN-8s: Is to sample the prediction results 8 times , Restore to the original size .
1. FCN-32s
The network structure is shown in the figure above , Here are some key points :
In the original paper, the author is backbone At the first convolution layer of padding=100, Why? ?
In order to make FCN The network can adapt to networks of different sizes
If not padding=100 How will it be handled ?
If the size of the picture is smaller than 192 * 192, The size of the last feature layer is less than 7*7 了 , If padding=0 Words , Then you can't control less than 7 * 7 The layer of is equivalent to the convolution operation of normal full connection .
In addition, if the size of the image we input is smaller than 32 * 32,backbone You have already reported an error before you finish leaving .
But think about this from the current perspective , The author is unnecessary padding=100 Of
First , Generally, no one will be less than 32 * 32 Semantic segmentation of pictures , in addition , When the picture is greater than 32 * 32 when , We can make FC6 in 7*7 Convoluted padding=3, We can train any height and width larger than 32 * 32 Pictures of the .
In the original paper, the author used the bilinear difference method to initialize the parameters of transpose convolution
2. FCN-16s
And above FCN-32s The difference lies in :
- The transpose convolution in this is sampled 2 times
- Extra use comes from maxpooling4 Input characteristic diagram of ( Down sampling 16 times )
3. FCN-8s
FCN-8s In addition to the above fusion operations , It also makes use of resources from maxpooling3 An output of
summary
FCN The biggest feature of the network is to convert the ordinary full connection layer into convolution layer , It not only realizes the degree of freedom of the input image , And the segmentation effect is very ideal .
common FCN There are three types of networks :,FCN-32s,FCN-16s,FCN-8s. Their effects are becoming more and more ideal in turn . In fact, through integration maxpoling Operation of layer information , To achieve better results .
Reference material
边栏推荐
- 快手小程序担保支付php源码封装
- XMeter Newsletter 2022-06|企业版 v3.2.3 发布,错误日志与测试报告图表优化
- Ml backward propagation
- 分布式定时任务之XXL-JOB
- Give some suggestions to friends who are just getting started or preparing to change careers as network engineers
- C语言-Cmake-CMakeLists.txt教程
- cv2读取视频-并保存图像或视频
- PHP 计算个人所得税
- 《通信软件开发与应用》课程结业报告
- Deeppath: a reinforcement learning method of knowledge graph reasoning
猜你喜欢
#797div3 A---C
Remote Sensing投稿經驗分享
2022年5月互联网医疗领域月度观察
科普 | 什么是灵魂绑定代币SBT?有何价值?
Alo who likes TestMan
Can you write the software test questions?
云原生应用开发之 gRPC 入门
A comprehensive and detailed explanation of static routing configuration, a quick start guide to static routing
Talk about the realization of authority control and transaction record function of SAP system
adb工具介绍
随机推荐
nmap工具介紹及常用命令
2022国内十大工业级三维视觉引导企业一览
JVM memory and garbage collection-3-runtime data area / method area
If time is a river
Version 2.0 of tapdata, the open source live data platform, has been released
Usage of hydraulic rotary joint
QT -- create QT program
VR/AR 的产业发展与技术实现
需要思考的地方
Remote Sensing投稿經驗分享
leetcode 865. Smallest Subtree with all the Deepest Nodes | 865.具有所有最深节点的最小子树(树的BFS,parent反向索引map)
Redisson distributed lock unlocking exception
BizDevOps与DevOps的关系
Alo who likes TestMan
生命的高度
The method of using thread in PowerBuilder
力争做到国内赛事应办尽办,国家体育总局明确安全有序恢复线下体育赛事
The body has a mysterious margin of 8px
node js 保持长连接
线程死锁——死锁产生的条件