当前位置:网站首页>[Deep Learning] Overview of Efficient and Lightweight Semantic Segmentation
[Deep Learning] Overview of Efficient and Lightweight Semantic Segmentation
2022-08-03 13:08:00 【Demeanor 78】
原文:On Efficient Real-Time Semantic Segmentation: A Survey
链接:https://arxiv.org/abs/2206.08605
1摘要
Semantic segmentation is an important part of visual understanding in autonomous driving.然而当前SOTAThe models are very complex and cumbersome,Therefore, it is not suitable for deployment on in-vehicle chip platforms with limited computing resources and low time-consuming requirements..This article delves into the more compact、More efficient model in order to solve the above problems,These models can be deployed on low memory embedded systems,At the same time meet the needs of real-time reasoning.This article discusses some excellent work in the field,Categorized according to their main contribution,Finally, this paper evaluates the inference speed of the model under the same hardware and software conditions,These conditions represents a typical high performanceGPUand low memory embeddedGPUactual deployment scenarios.本文的实验结果表明,Many jobs enable a balance of performance and time-consuming on resource-constrained hardware.
2数据集
Common segmentation datasets areCamVid[1], KITTI[2], Cityscapes[3], Berkeley DeepDrive[4], Audi Autonomous Driving[5], PASCAL VOC2012[6], NYU Depth V2[7]等.

3Efficient depth CNN 的技术
Downsampling and Upsampling
降采样:Significantly reduces computation by downsampling the input image、Increase inference speed,At the same time sacrificing the accuracy of the output.一般而言,Large and complex models need to use downsampling to improve the receptive field,Common downsampling operations are max/average pooling.
上采样:Segmentation tasks are different from other vision tasks,Usually want to keep the dimensions of the input and output,So upsampling must be used to restore the resolution,The common upsampling method is bilinear interpolation、Deconvolution etc..
高效卷积
The core idea of efficient convolution is to expand the receptive field of the model by stacking convolutional layers,Reduce the amount of model parameters and computation.Common efficient convolutions areDepthwise-Separable Convolution[8],Grouped Convolution[9],Asymmetric Convolution[10], Bottleneck[11], Dilated Convolution[12].

残差连接
residual connection pass[11]Often used in segmentation networks to improve gradient flow during backpropagation and reuse previous layer features.
主干网络
Many semantic segmentation models employ several widely used backbone networks as feature extractors,Common trunkResNet[11],Squeezenet[13],Shufflenet[14],Mobilenet[15],MobileNetV2[16],EfficientNet[17].

4Outstanding work
编码器-解码器
The core structure of semantic segmentation is the encoder-解码器.经典的模型有SegNet、U-net、Efficient Neural Network (ENet)、SQNet等等.

多分支
基于编码器-A major challenge of the decoder's approach is to preserve the high-resolution details extracted early in the network,So some multi-branch work feeds the original input image into the network at two or more scales.经典模型如Image Cascade Network (ICNet)、ContextNet、Guided Upsampling Network (GUN)等.

元学习
Most of the learning model in real-time semantic segmentation field are belong toNAS的范畴,is a method of automating the process of designing neural network structures.NAS Usually only the architectures that give the best results are involved,But under real-time requirements,Architecture size、Complexity and inference time constitute other factors that should be considered in optimizing functions.经典的算法有SqueezeNAS、FasterSeg、Graph-Guided Architecture Search (GAS) 等.

注意力
Attention mechanisms have been shown to be a key technique for vision tasks,But generally computationally cumbersome and inefficient.Although still not suitable for real-time inference,But some work likeDeep Feature Aggregation (DFANet)、Lightweight Encoder-Decoder (LEDNet)etc. reduce the complexity of attention.

训练优化
The last category is methods that take existing network structures and change the training process to improve model performance,Common as Knowledge DistillationStructured Knowledge Distillation、Knowledge Adaptation等

5评估
本文在Nvidia RTX 3090 GPU和嵌入式平台Nvidia Jetson Xavier AGX Developer KitThe time-consuming and performance of different algorithms are verified under the two platforms.

对比如下表所示:

6结论
This paper discusses a low-cost semantic segmentation algorithm for solving resource-constrained hardware,and discuss and categorize them according to their major contributions to the field.Finally, this paper conducts its own experiments,Analyze the speed and performance of the algorithm under the same hardware and software conditions,Provide a reference for model selection,Provide thinking direction for the optimization of future work.
7参考文献
[1] G. J. Brostow, J. Fauqueur and R. Cipolla, "Semantic object classes in video: A high-definition ground truth database," Pattern Recognit. Lett., vol. 30, p. 88–97, 2009.
[2] A. Geiger, P. Lenz, C. Stiller and R. Urtasun, "Vision meets robotics: The KITTI dataset," Int. J. Robotics Res., vol. 32, p. 1231–1237, 2013.
[3] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth and B. Schiele, "The Cityscapes Dataset for Semantic Urban Scene Understanding," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, 2016.
[4] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan and T. Darrell, "BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling," CoRR, vol. abs/1805.04687, 2018.
[5] J. Geyer, Y. Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V. H. Pham, M. Mühlegg, S. Dorn, T. Fernandez, M. Jänicke, S. Mirashi, C. Savani, M. Sturm, O. Vorobiov, M. Oelker, S. Garreis and P. Schuberth, "A2D2: Audi Autonomous Driving Dataset," CoRR, vol. abs/2004.06320, 2020.
[6] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman, The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.
[7] N. Silberman, P. Kohli and R. Fergus, "Indoor Segmentation and Support Inference from RGBD Images," in European Conference on Computer Vision, 2012.
[8] L. Sifre and S. Mallat, Rigid-Motion Scattering for Texture Classification, 2014.
[9] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems, 2012.
[10] M. Jaderberg, A. Vedaldi and A. Zisserman, Speeding up Convolutional Neural Networks with Low Rank Expansions, 2014.
[11] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[12] G. Papandreou, I. Kokkinos and P.-A. Savalle, "Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[13] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally and K. Keutzer, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size, 2016.
[14] X. Zhang, X. Zhou, M. Lin and J. Sun, "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[15] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto and H. Adam, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, 2017.
[16] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[17] M. Tan and Q. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in Proceedings of the 36th International Conference on Machine Learning, 2019.
往期精彩回顾
适合初学者入门人工智能的路线及资料下载(图文+视频)机器学习入门系列下载机器学习及深度学习笔记等资料打印《统计学习方法》的代码复现专辑机器学习交流qq群955171419,加入微信群请扫码
边栏推荐
猜你喜欢
leetcode16最接近的三数之和 (排序+ 双指针)
基于php志愿者服务平台管理系统获取(php毕业设计)
基于php家具销售管理系统获取(php毕业设计)
【实战技能】单片机bootloader的CANFD,I2C,SPI和串口方式更新APP视频教程(2022-08-01)
nacos应用
An introduction to the skeleton tool
An introduction to basic tools for selecting line tools (package church)
Station B responded that "HR said that core users are all Loser": the interviewer was persuaded to quit at the end of last year and will learn lessons to strengthen management
Graphic animation and button animation of an animation basic component
BOM系列之sessionStorage
随机推荐
Filebeat 如何保持文件状态?
B站回应“HR 称核心用户都是 Loser”:该面试官去年底已被劝退,会吸取教训加强管理
数据库基础知识一(MySQL)[通俗易懂]
Feature dimensionality reduction study notes (pca and lda) (1)
An动画基础之元件的影片剪辑效果
Sogou news - dataset
Classes and Objects (lower middle)
leetcode16最接近的三数之和 (排序+ 双指针)
When Nodejs installation depends on cpnm, the install shows Error: Cannot find module 'fs/promises'
期货公司开户关注的关键点
An工具介绍之骨骼工具
An动画优化之遮罩层动画
博客记录生活
【R】用grafify搞定统计绘图、方差分析、干预比较等!
Use %Status value
An动画优化之传统引导层动画
Tinymce plugins [Tinymce扩展插件集合]
JS获得浏览器类型
Blog records life
使用工作队列管理器(四)