当前位置:网站首页>[nas1] (2021cvpr) attentivenas: improving neural architecture search via attentive sampling (unfinished)
[nas1] (2021cvpr) attentivenas: improving neural architecture search via attentive sampling (unfinished)
2022-07-05 08:21:00 【Three nights is no more than string Ichiro】
【 notes 】: It is recommended to understand the multi-objective optimization problem first PF The concept of , And SPOS Basic flow .
One sentence summary : This paper improves SPOS Uniform sampling strategy in the training process (best up, worst up), Effective identification PF, Further improve the accuracy of the model .
Abstract
The problem background :NAS It has been designed to be precise and efficient SOTA Great achievements have been made in the model . At present , Two stages NAS, Such as BigNAS Decoupled the model training and search process and achieved good results . Two stages NAS You need to sample from the search space during training , It directly affects the accuracy of the model finally searched .
Raise questions : Due to the simplicity of uniform sampling , It has been widely used in two stages NAS During the training , however It is similar to the performance of the model PF irrelevant , It will lose the opportunity to further improve the accuracy of the model .
In this paper, we do : Committed to improving sampling strategies And to put forward AttentiveNAS, Effectively identify PF the front .
experimental result : Search the model family , be called AttentiveNAS models, stay ImageNet Admiral top-1 The precision of is from 77.3% Up to the 80.7%. The only 491MFLOPs Under the premise of , Realization ImageNet On 80.1% The accuracy of the .
1. Introduction
NAS Development Review :NAS For Automation DNN Design provides a useful tool . It optimizes the model architecture and model parameters at the same time , Created a challenging nested optimization problem . Conventional NAS Use evolutionary search or reinforcement learning , But these methods need to train thousands of models in a single experiment , The calculation cost is very expensive . Current NAS Decouple parameter training and architecture optimization into 2 An independent stage .
- In the first stage, the parameters of all candidate networks in the search space are optimized through weight sharing , Make all networks achieve optimal performance at the end of training ;
- The second stage uses a typical search algorithm , For example, evolutionary algorithm searches for the optimal model under various resource constraints .
In this way NAS Normal form search is very efficient and has excellent performance .
Discover scientific problems : Two stages NAS The success of depends strongly on the training of candidate networks in the first stage . In order to achieve the best performance of all candidate networks , They sample from the search space , And pass one-step stochastic gradient descent (SGD) Optimize each sample . Its The key is to figure out in each SGD Which network to sample in step . Existing methods use uniform sampling strategy to sample Networks , Experiments have proved that the uniform sampling strategy makes NAS The training of has nothing to do with the search stage , namely How to improve is not considered in the training stage PF, It cannot further improve network performance .
The work of this paper : Put forward AttentiveNAS Improve uniform sampling , Pay more attention to the possibility of producing better PF The network architecture of . This article specifically answers the following 2 A question :
- Which candidate networks should be sampled during training ?
- How can we effectively sample these candidate networks without introducing too much computational cost ?
For the first question , This article explores 2 There are different sampling strategies . The first strategy BestUp—— The optimal PF Perceptual sampling strategy , Put more training costs on improving the current best PF On ; The second strategy WorstUp, Candidate networks that focus on improving the worst performance tradeoffs , The worst Pareto Model , Be similar to hard example mining. Push the worst Pareto Set can help update the optimized parameters in the weight sharing network , So that all parameters are fully trained .
For the second question , Determine the optimal / The worst PF The network on is not straightforward . Using training loss and accuracy, we propose 2 Methods .
2. Related Work and Background
NAS Formulaic description of :
solve NAS, That's the formula (1) Early methods of : Often based on reinforcement learning or evolutionary algorithms , These methods need to train a large number of networks from scratch to get accurate performance estimates , Its calculation is very expensive .
The current method uses Weight sharing : Often train a weight sharing network and adopt a shape network , Through the way of inheritance right, we can directly obtain effective subnet performance estimation . This method alleviates the computational cost of training all networks from scratch and significantly accelerates NAS The process .
Based on weight sharing NAS Solve by continuous difference relaxation : Based on weight sharing NAS Constraints are often solved by continuous difference relaxation and gradient descent (1) type . But these methods are suitable for random seeds / Super parameters such as data division are very sensitive , Different DNNs The performance ranking correlation of also varies greatly in different experiments , It needs many rounds of repeated tests to obtain good performance . And the inherited weight is often suboptimal . therefore , This approach often requires retraining the discovered network architecture from scratch , Additional computational burden is introduced .
2.1 Two-stage NAS
Raise questions : type (1) The search scope is limited to a small subnet , Generate a challenging optimization problem —— Cannot take advantage of parameterization . Besides , type (1) Limited to a single resource constraint . Optimize under various resource constraints DNN It usually requires multiple independent searches .
Two stages NAS Introduce : Will optimize (1) Decompose into 2 Stages :(1) Unconstrained pre training : Jointly optimize all possible candidate networks through weight sharing ;(2) Resource constrained search : Find the best subnet under given resource constraints . Recent work in this direction includes BigNAS, SPOS, FairNAS, OFA, HAT etc. .
3. NAS via Attentive Samping
边栏推荐
- How to copy formatted notepad++ text?
- Google sitemap files for rails Projects - Google sitemap files for rails projects
- Live555 push RTSP audio and video stream summary (III) flower screen problem caused by pushing H264 real-time stream
- STM32---ADC
- STM32 virtualization environment of QEMU
- 实例007:copy 将一个列表的数据复制到另一个列表中。
- [cloud native | learn kubernetes from scratch] III. kubernetes cluster management tool kubectl
- FIO测试硬盘性能参数和实例详细总结(附源码)
- STM32 single chip microcomputer - bit band operation
- MySQL之MHA高可用集群
猜你喜欢
Take you to understand the working principle of lithium battery protection board
Summary of SIM card circuit knowledge
MySQL MHA high availability cluster
QEMU STM32 vscode debugging environment configuration
【云原生 | 从零开始学Kubernetes】三、Kubernetes集群管理工具kubectl
Design a clock frequency division circuit that can be switched arbitrarily
STM32 summary (HAL Library) - DHT11 temperature sensor (intelligent safety assisted driving system)
Summary -st2.0 Hall angle estimation
Halcon's practice based on shape template matching [1]
Shape template matching based on Halcon learning [v] find_ cocoa_ packages_ max_ deformation. Hdev routine
随机推荐
Simple design description of MIC circuit of ECM mobile phone
C WinForm [view status bar -- statusstrip] - Practice 2
STM32 outputs 1PPS with adjustable phase
Detailed explanation of SQL server stored procedures
Correlation based template matching based on Halcon learning [II] find_ ncc_ model_ defocused_ precision. hdev
DokuWiki deployment notes
[trio basic from introduction to mastery tutorial XIV] trio realizes unit axis multi-color code capture
STM32 --- GPIO configuration & GPIO related library functions
Keil use details -- magic wand
leetcode - 445. 两数相加 II
Detailed summary of FIO test hard disk performance parameters and examples (with source code)
Volatile of C language
[tutorial 19 of trio basic from introduction to proficiency] detailed introduction of trio as a slave station connecting to the third-party bus (anybus PROFIBUS DP...)
Connection mode - bridge and net
实例008:九九乘法表
Process communication mode between different hosts -- socket
OLED 0.96 inch test
NTC thermistor application - temperature measurement
动力电池UL2580测试项目包括哪些
VESC Benjamin test motor parameters