当前位置：网站首页>Semantic segmentation ｜ learning record (3) FCN

Semantic segmentation ｜ learning record (3) FCN

2022-07-08 02:09:00 【coder_ sure】

Tips ： come from up Lord thunderbolt Wz, I'm just taking study notes

List of articles

Preface
One 、Fully Convolutional Network
Two 、Convolutionalization
3、 ... and 、Convolutionalization Process principle
Four 、FCN Model details
summary
Reference material

Preface

Fully Convolutional Networks for Semantic Segmentation（FNC） The network is published in 2015CVPR An article on . Interested readers can click the link to download by themselves .

One 、Fully Convolutional Network

FCN(Fully Convolutional Network) yes First end-to-end For Pixel level prediction Of Fully convolutional network . The full convolution in this article means that the full connection layer in the classification network in the whole network is replaced by the convolution layer . This is a leading article in the field of semantic segmentation .
Insert picture description here
Let's take a look FCN The effect of semantic segmentation of the network ：

We can see FCN-8s The semantic segmentation effect of has been compared with Ground truth Very close , It can be explained here FCN The effect of the network is still very good （ At least in that year, the effect was good ）.
The author will also FCN Compared with the mainstream algorithm of that year , stay mean IoU And reasoning time can sling opponents .
Insert picture description here
Let's take a look at the article FCN-32s Network structure , You can also find that its network structure is very simple , But in fact, the semantic segmentation structure is very good , This is also FCN The excellence of .
By the way , We can observe through a series of convolutions 、 Down sampling , The last characteristic layer is observed channel=21, This is because the data set used is PASCAL VOC（20 Category + background ）, Then, we can get the same size feature map as the original image by up sampling （channel=21）, For each pixel 21 Value for softmax Handle , The prediction probability of the pixel for each category is obtained , Take the one with the greatest probability as the prediction category .
Insert picture description here

Two 、Convolutionalization

We know that the full connection layer requires us The pixels of the input picture are fixed . We need to fix the image input when training the classification network .
So if the The whole connection layer is changed into convolution layer , Then there is no limit to the size of the input image .
Here we will do the whole connection layer Convolutionalization Handle , As shown in the figure below , This is a picture of any size that can be input . But if the input image is Greater than 224*224 Words , Then the height and width of the last feature layer are greater than 1 了 , This corresponds to every last channel The data of is a 2 D data . Then we can develop it into a plane visualization .
Insert picture description here

3、 ... and 、Convolutionalization Process principle

We use VGG16 Take the Internet for example ：
Insert picture description here

take 7 * 7 * 512 The characteristic layer of flatten And a set of lengths after full connection operation 4096 The layer ,FC1 The parameter is 102760448
take 7 * 7 * 512 Through conv(7*7,s1,4096) After the operation ,conv The parameter quantity is also 102760448
From the above description, we can know that the two processes are completely equivalent , This is the same. Convolutionalization The process of .

Four 、FCN Model details

Given in the original paper FCN-32s-fixed,FCN-32s,FCN-16s,FCN-8s Comparison list of various performance indicators , We can also find that the effect is getting better .
Insert picture description here
What exactly are these models ？

FCN-32s： Is to sample the prediction results 32 times , Restore to the original size .
FCN-16s： Is to sample the prediction results 16 times , Restore to the original size .
FCN-8s： Is to sample the prediction results 8 times , Restore to the original size .

Insert picture description here

1. FCN-32s

Insert picture description here
The network structure is shown in the figure above , Here are some key points ：

In the original paper, the author is backbone At the first convolution layer of padding=100, Why? ？

In order to make FCN The network can adapt to networks of different sizes

If not padding=100 How will it be handled ？

If the size of the picture is smaller than 192 * 192, The size of the last feature layer is less than 7*7 了 , If padding=0 Words , Then you can't control less than 7 * 7 The layer of is equivalent to the convolution operation of normal full connection .
In addition, if the size of the image we input is smaller than 32 * 32,backbone You have already reported an error before you finish leaving .

But think about this from the current perspective , The author is unnecessary padding=100 Of

First , Generally, no one will be less than 32 * 32 Semantic segmentation of pictures , in addition , When the picture is greater than 32 * 32 when , We can make FC6 in 7*7 Convoluted padding=3, We can train any height and width larger than 32 * 32 Pictures of the .

In the original paper, the author used the bilinear difference method to initialize the parameters of transpose convolution

Bilinear difference

2. FCN-16s

Insert picture description here
And above FCN-32s The difference lies in ：

The transpose convolution in this is sampled 2 times
Extra use comes from maxpooling4 Input characteristic diagram of （ Down sampling 16 times ）

3. FCN-8s

Insert picture description here
FCN-8s In addition to the above fusion operations , It also makes use of resources from maxpooling3 An output of

summary

FCN The biggest feature of the network is to convert the ordinary full connection layer into convolution layer , It not only realizes the degree of freedom of the input image , And the segmentation effect is very ideal .
common FCN There are three types of networks ：,FCN-32s,FCN-16s,FCN-8s. Their effects are becoming more and more ideal in turn . In fact, through integration maxpoling Operation of layer information , To achieve better results .