当前位置：网站首页>Apple mobileone: the mobile terminal only needs 1ms of high-performance backbone

Apple mobileone: the mobile terminal only needs 1ms of high-performance backbone

2022-06-11 11:42:00 【Zhiyuan community】

The paper ：https://arxiv.org/abs/2206.04040

Reading guide

Efficient neural network backbones for mobile devices are usually aimed at FLOP Or parameter count etc . However , When deployed on mobile devices , These indicators may not have a good correlation with the reasoning delay of the trunk . therefore , In this paper, different indicators are widely analyzed by deploying multiple efficient networks on mobile devices . By identifying and analyzing the architecture and optimization bottlenecks in efficient neural networks , This article provides ways to alleviate these bottlenecks . So , The author designed an efficient backbone MobileOne, Its variant is in iPhone12 Reasoning time on is less than 1 millisecond , stay ImageNet Upper top-1 Accuracy rate is 75.9%. This article shows MobileOne State of the art performance in an efficient architecture , At the same time, the speed on mobile devices has been improved many times , One of the best models is ImageNet Got the same as MobileFormer Similar performance , At the same time, the speed is improved 38 times .MobileOne stay ImageNet Upper top-1 More accurate than EfficientNet High at similar delays 2.3%. Besides ,MobileOne It can be extended to multiple tasks —— Image classification 、 Object detection and semantic segmentation , Compared with the existing efficient architecture deployed on mobile devices , Delay and accuracy are significantly improved .

contribution

High efficiency network has more practical value , But academic research tends to focus on FLOPs Or decrease of parameter quantity , There is no strict consistency between the two and the reasoning efficiency . such as ,FLOPs Memory access consumption and computing parallelism are not considered , Like a parameterless operation ( Such as skipping the connection Add、Concat etc. ) It will bring significant memory access consumption , Lead to longer reasoning time .

In order to better analyze the bottleneck of high-efficiency network , The author iPhone12 Platform as the benchmark , From different dimensions " bottleneck " analysis , See above . You can see from it ：

A model with a high number of parameters can also have a low latency , such as ShuffleNetV2;

Have high FLOPs Our model can also have low latency , such as MobileNetV1 and ShuffleNetV2;

The above table starts from SRCC The angle is analyzed , You can see ：

In mobile terminal , Delay with FLOPs The correlation with parameter quantity is weak ;

stay PC-CPU End , This correlation is further weakened .

Method

Based on these insights , The author starts with two main efficiencies " bottleneck " Compared dimensionally , Then the performance " bottleneck " The corresponding scheme is put forward .

Activation Functions： The above table compares the effects of different activation functions on delay , You can see ： Despite the same architecture , However, the delay caused by different activation functions varies greatly . Default selection for this article ReLU Activation function .

Architectural Block： The above table shows the two main factors that affect the delay ( Memory access consumption and computing parallelism ) It has been analyzed , See the table above , You can see ： When a single branch structure is used , The model is faster . Besides , To improve efficiency , The author has limited practical application in large model configuration SE modular .

Based on the above analysis ,MobileOne The core module of is based on MobileNetV1 And Design , At the same time, it absorbs the idea of heavy parameters , Get the structure shown in the figure above . notes ： There is also a super parameter in the heavy parameter mechanism k Used to control the number of heavy parameter branches ( Experiments show that ： For small models , This variant is more profitable ).

stay Model Scaling Similar aspects MobileNetV2, The table above shows MobileOne Parameter information of different configurations .

In terms of training optimization , Smaller models require less regularity , So the author puts forward Annealing Regular adjustment mechanism of ( Can bring 0.5% The indicators have been improved ); Besides , The author also introduces a progressive learning mechanism ( Can bring 0.4% The indicators have been improved ); Last , The author also uses EMA Mechanism , Final MobileOne-S2 The model achieves 77.4% Indicators of .

experiment

The table above shows ImageNet Performance and efficiency comparison of different lightweight schemes on data sets , You can see ：

Even the lightest Transformer At least 4ms, and MobileOne-S4 Only 1.86ms You can achieve 79.4% The accuracy of the ;

comparison EfficientNet-B0,MobileOne-S3 It not only has high indicators 1%, At the same time, it has faster reasoning speed ;

Compared to other solutions , stay PC-CPU End ,MobileOne There are still very obvious advantages .

The table above shows MS-COCO testing 、VOC Segmentation and ADE20K Performance comparison on split tasks , Obviously ：

stay MC-COCO On mission ,MobileOne-S4 Than MNASNet High indicators 27.8%, Than MobileViT high 6.1%;

stay VOC Split tasks , The proposed scheme is better than MobileViT high 1.3%, Than MobileNetV2 high 5.8%;

stay ADE20K On mission , The proposed best solution is better than MobileNetV2 high 12%, and MobileOne-S1 Still than MobileNetV2 high 2.9%.

At the end of the article , The author made a witty remark ："Although, our models are state-of-the art within the regime of efficient architectures, the accuracy lags large models ConvNeXt and Swin Transformer".

原网站

版权声明
本文为[Zhiyuan community]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/162/202206111127588581.html