当前位置:网站首页>[Deep Learning] Overview of Efficient and Lightweight Semantic Segmentation
[Deep Learning] Overview of Efficient and Lightweight Semantic Segmentation
2022-08-03 13:08:00 【Demeanor 78】
原文:On Efficient Real-Time Semantic Segmentation: A Survey
链接:https://arxiv.org/abs/2206.08605
1摘要
Semantic segmentation is an important part of visual understanding in autonomous driving.然而当前SOTAThe models are very complex and cumbersome,Therefore, it is not suitable for deployment on in-vehicle chip platforms with limited computing resources and low time-consuming requirements..This article delves into the more compact、More efficient model in order to solve the above problems,These models can be deployed on low memory embedded systems,At the same time meet the needs of real-time reasoning.This article discusses some excellent work in the field,Categorized according to their main contribution,Finally, this paper evaluates the inference speed of the model under the same hardware and software conditions,These conditions represents a typical high performanceGPUand low memory embeddedGPUactual deployment scenarios.本文的实验结果表明,Many jobs enable a balance of performance and time-consuming on resource-constrained hardware.
2数据集
Common segmentation datasets areCamVid[1], KITTI[2], Cityscapes[3], Berkeley DeepDrive[4], Audi Autonomous Driving[5], PASCAL VOC2012[6], NYU Depth V2[7]等.

3Efficient depth CNN 的技术
Downsampling and Upsampling
降采样:Significantly reduces computation by downsampling the input image、Increase inference speed,At the same time sacrificing the accuracy of the output.一般而言,Large and complex models need to use downsampling to improve the receptive field,Common downsampling operations are max/average pooling.
上采样:Segmentation tasks are different from other vision tasks,Usually want to keep the dimensions of the input and output,So upsampling must be used to restore the resolution,The common upsampling method is bilinear interpolation、Deconvolution etc..
高效卷积
The core idea of efficient convolution is to expand the receptive field of the model by stacking convolutional layers,Reduce the amount of model parameters and computation.Common efficient convolutions areDepthwise-Separable Convolution[8],Grouped Convolution[9],Asymmetric Convolution[10], Bottleneck[11], Dilated Convolution[12].

残差连接
residual connection pass[11]Often used in segmentation networks to improve gradient flow during backpropagation and reuse previous layer features.
主干网络
Many semantic segmentation models employ several widely used backbone networks as feature extractors,Common trunkResNet[11],Squeezenet[13],Shufflenet[14],Mobilenet[15],MobileNetV2[16],EfficientNet[17].

4Outstanding work
编码器-解码器
The core structure of semantic segmentation is the encoder-解码器.经典的模型有SegNet、U-net、Efficient Neural Network (ENet)、SQNet等等.

多分支
基于编码器-A major challenge of the decoder's approach is to preserve the high-resolution details extracted early in the network,So some multi-branch work feeds the original input image into the network at two or more scales.经典模型如Image Cascade Network (ICNet)、ContextNet、Guided Upsampling Network (GUN)等.

元学习
Most of the learning model in real-time semantic segmentation field are belong toNAS的范畴,is a method of automating the process of designing neural network structures.NAS Usually only the architectures that give the best results are involved,But under real-time requirements,Architecture size、Complexity and inference time constitute other factors that should be considered in optimizing functions.经典的算法有SqueezeNAS、FasterSeg、Graph-Guided Architecture Search (GAS) 等.

注意力
Attention mechanisms have been shown to be a key technique for vision tasks,But generally computationally cumbersome and inefficient.Although still not suitable for real-time inference,But some work likeDeep Feature Aggregation (DFANet)、Lightweight Encoder-Decoder (LEDNet)etc. reduce the complexity of attention.

训练优化
The last category is methods that take existing network structures and change the training process to improve model performance,Common as Knowledge DistillationStructured Knowledge Distillation、Knowledge Adaptation等

5评估
本文在Nvidia RTX 3090 GPU和嵌入式平台Nvidia Jetson Xavier AGX Developer KitThe time-consuming and performance of different algorithms are verified under the two platforms.

对比如下表所示:

6结论
This paper discusses a low-cost semantic segmentation algorithm for solving resource-constrained hardware,and discuss and categorize them according to their major contributions to the field.Finally, this paper conducts its own experiments,Analyze the speed and performance of the algorithm under the same hardware and software conditions,Provide a reference for model selection,Provide thinking direction for the optimization of future work.
7参考文献
[1] G. J. Brostow, J. Fauqueur and R. Cipolla, "Semantic object classes in video: A high-definition ground truth database," Pattern Recognit. Lett., vol. 30, p. 88–97, 2009.
[2] A. Geiger, P. Lenz, C. Stiller and R. Urtasun, "Vision meets robotics: The KITTI dataset," Int. J. Robotics Res., vol. 32, p. 1231–1237, 2013.
[3] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth and B. Schiele, "The Cityscapes Dataset for Semantic Urban Scene Understanding," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, 2016.
[4] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan and T. Darrell, "BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling," CoRR, vol. abs/1805.04687, 2018.
[5] J. Geyer, Y. Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V. H. Pham, M. Mühlegg, S. Dorn, T. Fernandez, M. Jänicke, S. Mirashi, C. Savani, M. Sturm, O. Vorobiov, M. Oelker, S. Garreis and P. Schuberth, "A2D2: Audi Autonomous Driving Dataset," CoRR, vol. abs/2004.06320, 2020.
[6] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman, The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.
[7] N. Silberman, P. Kohli and R. Fergus, "Indoor Segmentation and Support Inference from RGBD Images," in European Conference on Computer Vision, 2012.
[8] L. Sifre and S. Mallat, Rigid-Motion Scattering for Texture Classification, 2014.
[9] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems, 2012.
[10] M. Jaderberg, A. Vedaldi and A. Zisserman, Speeding up Convolutional Neural Networks with Low Rank Expansions, 2014.
[11] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[12] G. Papandreou, I. Kokkinos and P.-A. Savalle, "Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[13] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally and K. Keutzer, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size, 2016.
[14] X. Zhang, X. Zhou, M. Lin and J. Sun, "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[15] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto and H. Adam, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, 2017.
[16] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[17] M. Tan and Q. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in Proceedings of the 36th International Conference on Machine Learning, 2019.

往期精彩回顾
适合初学者入门人工智能的路线及资料下载(图文+视频)机器学习入门系列下载机器学习及深度学习笔记等资料打印《统计学习方法》的代码复现专辑机器学习交流qq群955171419,加入微信群请扫码
边栏推荐
- The common problems in the futures account summary
- 类和对象(中下)
- An animation optimization of traditional guide layer animation
- 基于php家具销售管理系统获取(php毕业设计)
- 15. PARTITIONS「建议收藏」
- 使用工作队列管理器(四)
- leetcode16 Sum of the closest three numbers (sort + double pointer)
- How can I get a city's year-round weather data for free?Precipitation, temperature, humidity, solar radiation, etc.
- 基于php网上零食商店管理系统获取(php毕业设计)
- 欧曼自动挡、银河大马力、行星新产品 欧曼全新产品以燎原之势赢领市场
猜你喜欢

期货开户中常见问题汇总

An动画基础之元件的影片剪辑动画与传统补间

Unsupervised learning KMeans notes and examples

图像融合DDcGAN学习笔记

nacos应用

基于php网上零食商店管理系统获取(php毕业设计)
![[微服务]多级缓存](/img/58/72e01c789a862c058cba58b9113272.png)
[微服务]多级缓存

GameFi 行业下滑但未出局| June Report

Oracle安装完毕(系统盘),从系统盘转移到数据盘

How to disable software from running in the background in Windows 11?How to prevent apps from running in the background in Windows 11
随机推荐
Image fusion SDDGAN article learning
便携烙铁开源系统IronOS,支持多款便携DC, QC, PD供电烙铁,支持所有智能烙铁标准功能
B站回应“HR 称核心用户都是 Loser”:该面试官去年底已被劝退,会吸取教训加强管理
图像融合GAN-FM学习笔记
Win11怎么禁止软件后台运行?Win11系统禁止应用在后台运行的方法
【实战技能】单片机bootloader的CANFD,I2C,SPI和串口方式更新APP视频教程(2022-08-01)
超多精美礼品等你来拿!2022年中国混沌工程调查启动
shell编程之条件语句
How does Filebeat maintain file state?
PolarFormer: Multi-camera 3D Object Detection with Polar Transformers 论文笔记
Yahoo! Answers-数据集
php microtime 封装工具类,计算接口运行时间(打断点)
基于php旅游网站管理系统获取(php毕业设计)
实数取整写入文件(C语言文件篇)
为冲销量下探中低端市场,蔚来新品牌产品定价低至10万?
Five, the function calls
浅谈低代码平台远程组件加载方案
Redis connection pool tool class
Basic principle of the bulk of the animation and shape the An animation tip point
leetcode/字符串中的所有变位词(s1字符串的某个排列是s2的子串)的左索引