当前位置:网站首页>[nas1] (2021cvpr) attentivenas: improving neural architecture search via attentive sampling (unfinished)
[nas1] (2021cvpr) attentivenas: improving neural architecture search via attentive sampling (unfinished)
2022-07-05 08:21:00 【Three nights is no more than string Ichiro】
【 notes 】: It is recommended to understand the multi-objective optimization problem first PF The concept of , And SPOS Basic flow .
One sentence summary : This paper improves SPOS Uniform sampling strategy in the training process (best up, worst up), Effective identification PF, Further improve the accuracy of the model .
Abstract
The problem background :NAS It has been designed to be precise and efficient SOTA Great achievements have been made in the model . At present , Two stages NAS, Such as BigNAS Decoupled the model training and search process and achieved good results . Two stages NAS You need to sample from the search space during training , It directly affects the accuracy of the model finally searched .
Raise questions : Due to the simplicity of uniform sampling , It has been widely used in two stages NAS During the training , however It is similar to the performance of the model PF irrelevant , It will lose the opportunity to further improve the accuracy of the model .
In this paper, we do : Committed to improving sampling strategies And to put forward AttentiveNAS, Effectively identify PF the front .
experimental result : Search the model family , be called AttentiveNAS models, stay ImageNet Admiral top-1 The precision of is from 77.3% Up to the 80.7%. The only 491MFLOPs Under the premise of , Realization ImageNet On 80.1% The accuracy of the .
1. Introduction
NAS Development Review :NAS For Automation DNN Design provides a useful tool . It optimizes the model architecture and model parameters at the same time , Created a challenging nested optimization problem . Conventional NAS Use evolutionary search or reinforcement learning , But these methods need to train thousands of models in a single experiment , The calculation cost is very expensive . Current NAS Decouple parameter training and architecture optimization into 2 An independent stage .
- In the first stage, the parameters of all candidate networks in the search space are optimized through weight sharing , Make all networks achieve optimal performance at the end of training ;
- The second stage uses a typical search algorithm , For example, evolutionary algorithm searches for the optimal model under various resource constraints .
In this way NAS Normal form search is very efficient and has excellent performance .
Discover scientific problems : Two stages NAS The success of depends strongly on the training of candidate networks in the first stage . In order to achieve the best performance of all candidate networks , They sample from the search space , And pass one-step stochastic gradient descent (SGD) Optimize each sample . Its The key is to figure out in each SGD Which network to sample in step . Existing methods use uniform sampling strategy to sample Networks , Experiments have proved that the uniform sampling strategy makes NAS The training of has nothing to do with the search stage , namely How to improve is not considered in the training stage PF, It cannot further improve network performance .
The work of this paper : Put forward AttentiveNAS Improve uniform sampling , Pay more attention to the possibility of producing better PF The network architecture of . This article specifically answers the following 2 A question :
- Which candidate networks should be sampled during training ?
- How can we effectively sample these candidate networks without introducing too much computational cost ?
For the first question , This article explores 2 There are different sampling strategies . The first strategy BestUp—— The optimal PF Perceptual sampling strategy , Put more training costs on improving the current best PF On ; The second strategy WorstUp, Candidate networks that focus on improving the worst performance tradeoffs , The worst Pareto Model , Be similar to hard example mining. Push the worst Pareto Set can help update the optimized parameters in the weight sharing network , So that all parameters are fully trained .
For the second question , Determine the optimal / The worst PF The network on is not straightforward . Using training loss and accuracy, we propose 2 Methods .
2. Related Work and Background
NAS Formulaic description of :

solve NAS, That's the formula (1) Early methods of : Often based on reinforcement learning or evolutionary algorithms , These methods need to train a large number of networks from scratch to get accurate performance estimates , Its calculation is very expensive .
The current method uses Weight sharing : Often train a weight sharing network and adopt a shape network , Through the way of inheritance right, we can directly obtain effective subnet performance estimation . This method alleviates the computational cost of training all networks from scratch and significantly accelerates NAS The process .
Based on weight sharing NAS Solve by continuous difference relaxation : Based on weight sharing NAS Constraints are often solved by continuous difference relaxation and gradient descent (1) type . But these methods are suitable for random seeds / Super parameters such as data division are very sensitive , Different DNNs The performance ranking correlation of also varies greatly in different experiments , It needs many rounds of repeated tests to obtain good performance . And the inherited weight is often suboptimal . therefore , This approach often requires retraining the discovered network architecture from scratch , Additional computational burden is introduced .
2.1 Two-stage NAS
Raise questions : type (1) The search scope is limited to a small subnet , Generate a challenging optimization problem —— Cannot take advantage of parameterization . Besides , type (1) Limited to a single resource constraint . Optimize under various resource constraints DNN It usually requires multiple independent searches .
Two stages NAS Introduce : Will optimize (1) Decompose into 2 Stages :(1) Unconstrained pre training : Jointly optimize all possible candidate networks through weight sharing ;(2) Resource constrained search : Find the best subnet under given resource constraints . Recent work in this direction includes BigNAS, SPOS, FairNAS, OFA, HAT etc. .

3. NAS via Attentive Samping
边栏推荐
- Relationship between line voltage and phase voltage, line current and phase current
- MySQL MHA high availability cluster
- Basic information commands and functions of kernel development
- Step motor generates S-curve upper computer
- Drive LED -- GPIO control
- Stablq of linked list
- 实例005:三数排序 输入三个整数x,y,z,请把这三个数由小到大输出。
- Shell script
- Detailed summary of FIO test hard disk performance parameters and examples (with source code)
- C language # and #
猜你喜欢

My-basic application 2: my-basic installation and operation

Bluebridge cup internet of things basic graphic tutorial - GPIO output control LD5 on and off
![[paper reading] the latest transfer ability in deep learning: a survey in 2022](/img/6b/b564fb7a6895329073fb5eaff64340.png)
[paper reading] the latest transfer ability in deep learning: a survey in 2022
![Measurement fitting based on Halcon learning [III] PM_ measure_ board. Hdev routine](/img/f9/fc4f0bbce36b3c1368d838d723b027.jpg)
Measurement fitting based on Halcon learning [III] PM_ measure_ board. Hdev routine

实例004:这天第几天 输入某年某月某日,判断这一天是这一年的第几天?
![C WinForm [display real-time time in the status bar] - practical exercise 1](/img/9f/d193cbb488542cc4c439efd79c4963.jpg)
C WinForm [display real-time time in the status bar] - practical exercise 1

STM32 single chip microcomputer -- volatile keyword

Relationship between line voltage and phase voltage, line current and phase current
![C WinForm [view status bar -- statusstrip] - Practice 2](/img/40/63065e6c4dc4e9fcb3e898981f518a.jpg)
C WinForm [view status bar -- statusstrip] - Practice 2

如何写Cover Letter?
随机推荐
STM32 --- NVIC interrupt
实例005:三数排序 输入三个整数x,y,z,请把这三个数由小到大输出。
STM32 outputs 1PPS with adjustable phase
Semiconductor devices (III) FET
Weidongshan Internet of things learning lesson 1
Compilation warning solution sorting in Quartus II
实例001:数字组合 有四个数字:1、2、3、4,能组成多少个互不相同且无重复数字的三位数?各是多少?
99 multiplication table (C language)
Nb-iot technical summary
C WinForm [change the position of the form after running] - Practical Exercise 4
Stablq of linked list
Classic application of MOS transistor circuit design (1) -iic bidirectional level shift
Relationship between line voltage and phase voltage, line current and phase current
C WinForm [get file path -- traverse folder pictures] - practical exercise 6
Several implementation schemes of anti reverse connection protection of positive and negative poles of power supply!
Void* C is a carrier for realizing polymorphism
实例002:“个税计算” 企业发放的奖金根据利润提成。利润(I)低于或等于10万元时,奖金可提10%;利润高于10万元,低于20万元时,低于10万元的部分按10%提成,高于10万元的部分,可提成7.
Volatile of C language
[trio basic from introduction to mastery tutorial XIV] trio realizes unit axis multi-color code capture
Imx6ull bare metal development learning 2- use C language to light LED indicator