当前位置:网站首页>Semantic segmentation | learning record (3) FCN
Semantic segmentation | learning record (3) FCN
2022-07-08 02:09:00 【coder_ sure】
Tips : come from up Lord thunderbolt Wz, I'm just taking study notes
List of articles
Preface
Fully Convolutional Networks for Semantic Segmentation(FNC) The network is published in 2015CVPR An article on . Interested readers can click the link to download by themselves .
One 、Fully Convolutional Network
FCN(Fully Convolutional Network) yes First end-to-end For Pixel level prediction Of Fully convolutional network . The full convolution in this article means that the full connection layer in the classification network in the whole network is replaced by the convolution layer . This is a leading article in the field of semantic segmentation .
Let's take a look FCN The effect of semantic segmentation of the network :
We can see FCN-8s The semantic segmentation effect of has been compared with Ground truth Very close , It can be explained here FCN The effect of the network is still very good ( At least in that year, the effect was good ).
The author will also FCN Compared with the mainstream algorithm of that year , stay mean IoU And reasoning time can sling opponents .
Let's take a look at the article FCN-32s Network structure , You can also find that its network structure is very simple , But in fact, the semantic segmentation structure is very good , This is also FCN The excellence of .
By the way , We can observe through a series of convolutions 、 Down sampling , The last characteristic layer is observed channel=21, This is because the data set used is PASCAL VOC(20 Category + background ), Then, we can get the same size feature map as the original image by up sampling (channel=21), For each pixel 21 Value for softmax Handle , The prediction probability of the pixel for each category is obtained , Take the one with the greatest probability as the prediction category .
Two 、Convolutionalization
We know that the full connection layer requires us The pixels of the input picture are fixed . We need to fix the image input when training the classification network .
So if the The whole connection layer is changed into convolution layer , Then there is no limit to the size of the input image .
Here we will do the whole connection layer Convolutionalization Handle , As shown in the figure below , This is a picture of any size that can be input . But if the input image is Greater than 224*224 Words , Then the height and width of the last feature layer are greater than 1 了 , This corresponds to every last channel The data of is a 2 D data . Then we can develop it into a plane visualization .
3、 ... and 、Convolutionalization Process principle
We use VGG16 Take the Internet for example :
- take 7 * 7 * 512 The characteristic layer of flatten And a set of lengths after full connection operation 4096 The layer ,FC1 The parameter is 102760448
- take 7 * 7 * 512 Through conv(7*7,s1,4096) After the operation ,conv The parameter quantity is also 102760448
From the above description, we can know that the two processes are completely equivalent , This is the same. Convolutionalization The process of .
Four 、FCN Model details
Given in the original paper FCN-32s-fixed,FCN-32s,FCN-16s,FCN-8s Comparison list of various performance indicators , We can also find that the effect is getting better .
What exactly are these models ?
- FCN-32s: Is to sample the prediction results 32 times , Restore to the original size .
- FCN-16s: Is to sample the prediction results 16 times , Restore to the original size .
- FCN-8s: Is to sample the prediction results 8 times , Restore to the original size .

1. FCN-32s

The network structure is shown in the figure above , Here are some key points :
In the original paper, the author is backbone At the first convolution layer of padding=100, Why? ?
In order to make FCN The network can adapt to networks of different sizes
If not padding=100 How will it be handled ?
If the size of the picture is smaller than 192 * 192, The size of the last feature layer is less than 7*7 了 , If padding=0 Words , Then you can't control less than 7 * 7 The layer of is equivalent to the convolution operation of normal full connection .
In addition, if the size of the image we input is smaller than 32 * 32,backbone You have already reported an error before you finish leaving .
But think about this from the current perspective , The author is unnecessary padding=100 Of
First , Generally, no one will be less than 32 * 32 Semantic segmentation of pictures , in addition , When the picture is greater than 32 * 32 when , We can make FC6 in 7*7 Convoluted padding=3, We can train any height and width larger than 32 * 32 Pictures of the .
In the original paper, the author used the bilinear difference method to initialize the parameters of transpose convolution
2. FCN-16s

And above FCN-32s The difference lies in :
- The transpose convolution in this is sampled 2 times
- Extra use comes from maxpooling4 Input characteristic diagram of ( Down sampling 16 times )
3. FCN-8s

FCN-8s In addition to the above fusion operations , It also makes use of resources from maxpooling3 An output of
summary
FCN The biggest feature of the network is to convert the ordinary full connection layer into convolution layer , It not only realizes the degree of freedom of the input image , And the segmentation effect is very ideal .
common FCN There are three types of networks :,FCN-32s,FCN-16s,FCN-8s. Their effects are becoming more and more ideal in turn . In fact, through integration maxpoling Operation of layer information , To achieve better results .
Reference material
边栏推荐
- node js 保持长连接
- Cross modal semantic association alignment retrieval - image text matching
- MySQL查询为什么没走索引?这篇文章带你全面解析
- Many friends don't know the underlying principle of ORM framework very well. No, glacier will take you 10 minutes to hand roll a minimalist ORM framework (collect it quickly)
- Key points of data link layer and network layer protocol
- 直接加比较合适
- 分布式定时任务之XXL-JOB
- 快手小程序担保支付php源码封装
- Ml self realization /knn/ classification / weightlessness
- Vim 字符串替换
猜你喜欢

How to make the conductive slip ring signal better

C语言-Cmake-CMakeLists.txt教程

Introduction to grpc for cloud native application development

OpenGL/WebGL着色器开发入门指南

See how names are added to namespace STD from cmath file

微信小程序uniapp页面无法跳转:“navigateTo:fail can not navigateTo a tabbar page“

2022年5月互联网医疗领域月度观察

metasploit

The function of carbon brush slip ring in generator

Kwai applet guaranteed payment PHP source code packaging
随机推荐
Talk about the cloud deployment of local projects created by SAP IRPA studio
BizDevOps与DevOps的关系
leetcode 866. Prime Palindrome | 866. 回文素数
Cross modal semantic association alignment retrieval - image text matching
[target tracking] |atom
Many friends don't know the underlying principle of ORM framework very well. No, glacier will take you 10 minutes to hand roll a minimalist ORM framework (collect it quickly)
科普 | 什么是灵魂绑定代币SBT?有何价值?
How mysql/mariadb generates core files
Exit of processes and threads
PHP 计算个人所得税
Is NPDP recognized in China? Look at it and you'll see!
Little knowledge about TXE and TC flag bits
JVM memory and garbage collection-3-runtime data area / heap area
How to make the conductive slip ring signal better
微信小程序uniapp页面无法跳转:“navigateTo:fail can not navigateTo a tabbar page“
文盘Rust -- 给程序加个日志
PB9.0 insert OLE control error repair tool
List of top ten domestic industrial 3D visual guidance enterprises in 2022
Key points of data link layer and network layer protocol
XXL job of distributed timed tasks