当前位置:网站首页>FCN - the originator of semantic segmentation (based on tf-Kersa reproduction code)
FCN - the originator of semantic segmentation (based on tf-Kersa reproduction code)
2022-08-04 07:19:00 【hot-blooded chef】
1. What is semantic segmentation?
Image semantic segmentation, in short, is to classify all pixels on an image, and mark all objects belonging to the same class as the same pixel.
What I'm going to introduce today is the first method of semantic segmentation using convolutional neural networks - FCN.
2. FCN (Fully Convolutional Networks for Semantic Segmentation)
A common convolutional neural network is connected to several fully connected layers after multiple convolutions, and the feature map generated by convolution and downsampling is mapped into a fixed-length feature vector.The general CNN structure is suitable for image-level classification and regression tasks, because they both expect the probability of the classification of the input image in the end.Common CNN networks such as VGG and ResNet finally output a 1000-dimensional vector representing the probability that the input image belongs to each class.
The FCN removes these fully connected layers, replaces them with conventional convolutional layers, and then upsamples to the size of the original image, and outputs the prediction result.
For developers, the biggest advantage is that the changes are small. From the classic classification network to FCN, it is only necessary to replace the fully connected layer.For example, the VGG16_FCN in the paper only needs to change the final fully connected layer (4096, 1, 1), (4096, 1, 1) (1000, 1, 1) into a regular convolutional layer (4096, 7, 7), (4096, 7, 7), (1000, 7, 7), and then upsample to the original image size, so that the number of channels represents the number of output categories, and the corresponding 0 on each channel does not belong toThe pixels of this class, 1 is the pixels belonging to this class.
Accuracy loss
Of course, this will also cause a loss of accuracy. After the FCN is compressed by a backbone of 32 times (after 5 stages), if it is directly upsampled, there will inevitably be discontinuous and incorrect problems at the edge of the object.Therefore, the authors of FCN also proposed feature fusion for the outputs of different stages, supplementing the location information with the output of the shallow network, and supplementing the semantic information with the output of the deep network, thus making up for the loss on the edge to a certain extent.
As shown in the figure above, the output of stage5 is directly output by 32 times upsample, which is the structure of FCN32, which has the lowest accuracy.In FCN16, the output of stage5 is directly summed by 2 times upsample and the output of stage, and then 16 times upsample is used as the final output. The accuracy of this structure will be higher than that of FCN32.The accuracy of FCN8 is a little higher.The results will be posted later.
How to upsample
At present, there are two ways of upsampling, one is deconvolution and the other is bilinear interpolation.According to the source code published by the original author, the author used bilinear interpolation, and they themselves said that after their tests, there was no significant difference in accuracy. Bilinear interpolation has no parameters to learn, and the speed will be faster.Hurry up.
Loss function
FCN is predicted pixel by pixel, so for each pixel, the Ground Truth is either 0 or 1, even if the segmented image is single-channel (the index is given according to the number of classifications), it can become One-The form of Hot, for example, the label of VOC is 21 channels.
So according to how the pixels are predicted, cross entropy can be used.Of course, later generations have improved and adopted dice loss, focal loss, etc., and will not be discussed here.
Prediction accuracy
The prediction accuracy of FCN is shown in the figure:
3. Summary
FCN is the first person to use deep learning for semantic segmentation. Compared with many new networks, the effect is indeed worse.However, most of the following networks follow the idea of FCN, and add some tricks or add some new components on this basis.If you can understand FCN, you will be familiar with other segmentation networks later.
4. Implementation code
- Code published by the original authorshelhamer/fcn.berkeleyvision.org
- Personal code reproduction Runist/FCN-keras
- FCN Papers
边栏推荐
- this关键字,构造函数
- MMDeploy部署实战系列【第三章】:MMdeploy pytorch模型转换onnx,tensorrt
- 对象的扩展补充
- 无监督特征对齐的迁移学习理论框架
- 用matlab打造的摩斯电码加解码器音频版,支持包括中文在内的任意字符
- DOM的12中节点类型,通过关系或方法获取DOM节点,渲染到浏览器页面的一些特效功能,获取DOM节点来改变属性,点击图片,切换为所点击的图片为背景图,页面上的表单验证,点击底部导航栏切换界面
- TypeScript基本类型、类、封装、继承、泛型、接口、命名空间
- 更改mysql数据库默认的字符集(mysql 存储 emoji表情)
- 专属程序员的浪漫七夕
- 基于时序模式注意力机制(TPA)的长短时记忆(LSTM)网络TPA-LSTM的多变量输入风电功率预测
猜你喜欢
类图规范总结
有趣的USB接口和颜色分类
关于我写的循环遍历
Database document generation tool V1.0
舍不得花钱买1stOpt,不妨试试这款免费的拟合优化神器【openLU】
Network skills: teach you to install batteries on the router, you can still surf the Internet when the power is cut off!
Computer software: recommend a disk space analysis tool - WizTree
拒绝碰运气,导师人品这样了解!
ERROR 2003 (HY000) Can‘t connect to MySQL server on ‘localhost3306‘ (10061)解决办法
fanuc机器人IO分配报警信号分配无效
随机推荐
MySQL - Row size too large (> 8126). Changing some columns to TEXT or BLOB
Computer knowledge: desktop computers should choose the brand and assembly, worthy of collection
MMDeploy部署实战系列【第三章】:MMdeploy pytorch模型转换onnx,tensorrt
idea使用@Autowired注解爆红原因及解决方法
mysql锁机制
MMDeploy部署实战系列【第四章】:onnx,tensorrt模型推理
无监督特征对齐的迁移学习理论框架
Different lower_case_table_names settings for server (‘1‘) and data dictionary (‘0‘) 解决方案
VMD结合ISSA优化LSSVM功率预测
MySQL大总结
Gramm Angle field GAF time-series data into the image and applied to the fault diagnosis
Base64编码原理
缓动动画,有关窗口的一些常见操作,BOM操作
JVM 快速检测死锁
Promise.all 使用方法
2022年7月总结
零分贝超静音无线鼠标!数量有限!!先到先得!!!【元旦专享】
子空间结构保持的多层极限学习机自编码器(ML-SELM-AE)
JVM工具之 JPS
窥探晶体世界的奥秘 —— 230种空间群晶体结构模型全在这里