当前位置：网站首页>【MnasNet】《MnasNet：Platform-Aware Neural Architecture Search for Mobile》

【MnasNet】《MnasNet：Platform-Aware Neural Architecture Search for Mobile》

2022-07-02 07:48:00 【bryant_ meng】

Insert picture description here

CVPR-2019

List of articles

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
6 Conclusion（own）

1 Background and Motivation

The author aims to design a new resource-constrained mobile model Let it be in resource-constrained platforms Run faster

2 Related Work

Compress the existing network ： quantitative ,pruning ,NetAdapt etc. ,do not focus on learning novel compositions of CNN operations
hand-crafted Design ,usually take significant human efforts
NAS, Based on a variety of learning algorithms, for example reinforcement learning / evolutionary search / differentiable search

3 Advantages / Contributions

NAS Out MnasNet, Two main innovations

incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency（ Not just ACC）
a novel factorized hierarchical search space that encourages layer diversity throughout the network.（ Unlike NasNet That is cell Grade , It is block Grade ）

achieve new state-of-the-art results on both ImageNet classification and COCO object detection under typical mobile inference latency constraints

4 Method

4.1 Problem Formulation

Previous methods objective function
Insert picture description here
$m$ yes model, $A C C$ yes accuracy, $L A T$ yes inference latency, $T$ yes target latency

above objective Only accuracy is considered , Speed is not considered

author more interested in finding multiple Pareto-optimal solutions in a single architecture search（ Speed and accuracy trade-off）

Designed the following objective function
Insert picture description here

according to $\alpha$ and $\beta$ Different values , There are the following soft and hard edition

Insert picture description here
Abscissa is latency, Ordinate for objective

soft edition $- 0.07$ Its origin is as follows ：

we empirically observed doubling the latency usually brings about 5% relative accuracy gain

$\cdot (1 + ％5 ) \cdot (2l/T )^{\beta}\approx Reward(M1) = a \cdot (l/T )^{\beta}$

Calculate according to the above formula $\beta \approx -0.07$

4.2 Factorized Hierarchical Search Space

Insert picture description here
allowing different layer architectures in different blocks

The same block Medium N individual layer It's the same ,layer The operation inside is as follows

Insert picture description here
When searching using MobileNetV2 as a reference

Every layers Number {0, +1, -1} based on MobileNetV2

filter size per layer {0.75, 1.0, 1.25} to MobileNetV2

One of the finished structures

Insert picture description here

The size of the search space is as follows ：

hypothesis $B$ blocks,and each block has a sub search space of size $S$ with average $N$ layers per block

The search space is $S^B$

Every layer If it's all different , Then for $S^{B*N}$

4.3 Search Algorithm

Insert picture description here
sample-eval-update loop,maximize the expected reward：

reward value R(m) It's using objective function

5 Experiments

5.1 Datasets

directly perform our architecture search on the ImageNet training set but with fewer training steps (5 epochs)

The difference in NasNet Of Cifar10

5.2 Results

1）ImageNet Classification Performance
Insert picture description here
T = 75 ms, One search , Multiple model A1 / A2 / A3

comparison mobileNet v2, Introduced SE modular , Discuss SE The impact of modules
Insert picture description here

2）Model Scaling Performance

Insert picture description here
there depth multiplier refer to channels, It can be seen that it is leading in all directions mobilenet v2

The author can also flexibly change NAS when T To control the size of the model , As can be seen from the table above , It is more powerful than cutting the number of channels on a large model

3）COCO Object Detection Performance

Insert picture description here
There is nothing to comment on , All vegetables and chickens peck each other , ha-ha , Just kidding , There is a certain improvement

5.3 Ablation Study and Discussion

1）Soft vs. Hard Latency Constraint
Insert picture description here

Insert picture description here

hard edition focus more on faster models to avoid the latency penalty（objective function It can also be seen that ）

soft edition tries to search for models across a wider latency range

2）Disentangling Search Space and Reward

Insert picture description here
Decoupling discusses the role of the next two innovations

3）Layer Diversity

Insert picture description here

6 Conclusion（own）

stay mobilenet v2 Search based on
Pareto-optimal, Pareto is the best （ From Baidu Encyclopedia ）
Pareto is the best （Pareto Optimality）, Also known as Pareto efficiency （Pareto efficiency）, It refers to an ideal state of resource allocation , Suppose there is an inherent group of people and distributable resources , The change from one allocation state to another , Without making anyone worse , Make at least one person better , This is Pareto improvement or Pareto optimization .
Pareto optimal state It is impossible to have more room for Pareto improvement ; let me put it another way , Pareto improvement is the path and method to achieve Pareto optimality . Pareto optimality is fair and efficient “ Ideal kingdom ”. It was proposed by Pareto .

原网站

版权声明
本文为[bryant_ meng]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/183/202207020622578533.html