当前位置:网站首页>Paper recommendation: efficientnetv2 - get smaller models and faster training speed through NAS, scaling and fused mbconv
Paper recommendation: efficientnetv2 - get smaller models and faster training speed through NAS, scaling and fused mbconv
2022-06-28 06:06:00 【deephub】
EfficientNetV2 By Google Research,Brain Team Published in the 2021 ICML A paper on , It's used in combination NAS And zoom , Optimize training speed and parameter efficiency . And new operations are used in the model ( Such as Fused-MBConv) Search in the search space .EfficientNetV2 Model ratio EfficientNetV1 Training is much faster , At the same time, the volume is small 6.8 times .

The outline of the thesis is as follows :
- Understand and improve EfficientNetV1 Training efficiency of
- NAS And zoom
- Progressive Learning
- SOTA Compare
- Melting research
Understand and improve EfficientNetV1 Training efficiency of
1、 Training with a very large image size is slow
EfficientNet Large image size will lead to a lot of memory usage . because GPU/TPU The total memory on the is fixed , So use a smaller batch size , This will greatly slow down the training speed .

FixRes ( The paper FixRes: Fixing the Train-Test Resolution Discrepancy) It can be used for training rather than reasoning by using a smaller image size . Smaller image sizes result in less computation and support larger batch sizes , This will increase the training speed by up to 2.2 times , And the accuracy will be improved .
2、Depth-wise Convolution performs slowly in the early layer of the model but is effective in the later layer
Fused-MBConv In Google AI In the blog , Fused-MBConv take MBConv Medium depthwise conv3×3 And extend conv1×1 Replace with a single regular conv3×3.

MBConv and Fused-MBConv Structure
Fused-MBConv Gradually EfficientNet-B4 The primordial in MBConv Replace with Fused-MBConv.

At an early stage 1-3 When applied in ,Fused-MBConv It can speed up training , And the parameters and FLOP It's a small expense .
But if all blocks use Fused-MBConv( Stage 1-7), Then it will significantly increase the parameters and FLOP, It also slows down your training .
3、 It is not optimal to expand the scale at each stage
EfficientNet Use simple compound scaling rules to extend all stages equally . for example , When the depth coefficient is 2 when , All stages in the network will double the number of layers . But in fact, the contribution of these stages to training speed and parameter efficiency is not the same . stay EfficientNetV2 in , Use a non-uniform scaling strategy to gradually add more layers to the later part of the model .EfficientNets Actively expand the image size , Lead to a lot of memory consumption and slow training . To solve this problem ,EfficientNetV2 Slightly modify the scaling rules , The maximum image size is limited to a smaller value .
NAS And zoom
1、NAS Search for
Neural architecture search (NAS) The search space is similar to PNASNet. adopt NAS Type of convolution operation {MBConv, Fused-MBConv} Design options for , Including the number of layers , Kernel size {3×3, 5×5}, Expansion ratio {1, 4, 6} wait . On the other hand , The search space size is optimized in the following ways :
- Remove unnecessary search options , for example pooling skip operation , Because they were never in the original EfficientNets Use in ;
- Reuse and in EfficientNets The channel size that has been searched in .
In the case of reducing the image size , Yes 1000 Multiple models , Go on about 10 Rounds of sampling and training , Through model accuracy A、 Normalized training step size S And parameter size P To search , And use simple weighted product ax (Sw)×(Pv), Identified one of them w=-0.07 and v=-0.05.

EfficientNetV2 And EfficientNetV1 There are several main differences :
- EfficientNetV2 Widely used in early layers MBConv And newly added fused-MBConv.
- EfficientNetV2 prefer MBConv Smaller expansion ratio of , Because smaller scaling ratios tend to have less memory access overhead .
- EfficientNetV2 Prefer smaller core sizes ( 3×3), But it adds more layers to compensate for the decrease in receptive fields caused by the smaller kernel size .
- EfficientNetV2 The original... Is completely removed EfficientNet Last of stride-1 Stage , This may be due to its large parameter size and memory access overhead .
2、 The zoom
EfficientNetV2-S Use with EfficientNet A similar composite zoom scale enlarges to obtain EfficientNetV2-M/L, Some additional optimizations have been made :
The maximum inference image size is limited to 480, Because very large images usually lead to expensive memory and training speed overhead ;
As a heuristic , More layers will be added to later stages ( for example , Stage 5 and 6), So as to increase the network capacity without increasing too much runtime overhead .

Through training NAS And zoom , The proposed EfficientNetV2 The training speed of the model is much faster than that of other models .
Progressive Learning

Improve the training process in learning

EfficientNetV2 Your training settings

ImageNet top-1 Accuracy rate
The model performs best when the image size is small and the enlargement is weak ; But for larger images , It performs better in the case of stronger amplification . Small image size and weak regularization (epoch = 1) Start , Then, with the larger image size and stronger regularization, the learning difficulty gradually increases : Bigger Dropout rate 、RandAugment Amplitude and mixing ratio ( for example ,epoch = 300).
The pseudocode of the process is shown below :

SOTA Compare
1、ImageNet

The model size 、FLOP And reasoning delay —— The delay is in V100 GPU In order to 16 Lot size measurement

Be marked with 21k The model of ImageNet21k On the use of 13M Image pre training , Other models are directly in ImageNet ILSVRC2012 On the use of 128 Million images to train from scratch .
EfficientNetV2 The model is better than before ImageNet Upper ConvNets and Transformer The model is significantly faster , Better accuracy and parameter efficiency are achieved .
EfficientNetV2-M Achieved and EfficientNet-B7 Quite accurate , At the same time, the speed of training with the same computing resources is improved 11 times .
EfficientNetV2 The model is also significantly superior to all recent models in terms of accuracy and reasoning speed RegNet and ResNeSt
The first figure at the top shows the results .
By means of ImageNet21k Pre training on ( 32 individual TPU, Two days ),EfficientNetV2-L(21k) take top-1 Improved accuracy 1.5%(85.3% Yes 86.8%), Fewer parameters are used at run time 2.5 times ,FLOP Less 3.6 times Training and reasoning speed increased 6 times — 7 times .
2、 The migration study
In this paper, the following data sets are used for migration learning test :

Each model is fine tuned in a few steps .EfficientNetV2 The model outperforms the previous one on all these datasets ConvNets and Vision Transformers.

stay CIFAR-100 On ,EfficientNetV2-L Is more accurate than the previous GPipe/EfficientNets high 0.6%, Than before ViT/DeiT Model height 1.5%. These results suggest that ,EfficientNetV2 Its generalization ability is far beyond ImageNet.
Melting research
1、 Performance of the same training

Performance comparison using the same learning settings ,EfficientNetV2 The performance of the model is still much better than EfficientNets:EfficientNetV2-M Reduce the parameters 17%,FLOPs Less 37%, At the same time, the running speed in training is faster than EfficientNet-B7 fast 4.1 times , Reasoning is fast 3.1 times .
2、 Model scaling

By using EfficientNet Compound zoom zoom out EfficientNetV2-S To compare smaller models . All models are in the absence of Progressive Learning Training under the same circumstances . EfficientNetV2 (V2) Models are usually faster , At the same time, considerable parameter efficiency is maintained .
3、 Different networks Progressive Learning

Progressive Learning It usually reduces training time , Improve the accuracy of all different networks at the same time .
4、 Adaptive regularization (Adaptive Regularization) Importance

Adaptive regularization 
The reason for random resizing is TPU You need to recompile the dynamic run diagram for each new size , So every 8 individual epoch Sample the image size at random , Not every batch .
Adaptive regularization uses very little regularization for small images in the early training period , Make the model converge faster and obtain better final accuracy .
quote
https://www.overfit.cn/post/053825be64b64acfa9cbd527a4a1cab7
[2021 ICML] [EfficientNetV2]
EfficientNetV2: Smaller Models and Faster Training
https://arxiv.org/abs/2104.00298
边栏推荐
- 5g network overall architecture
- 马赛克数据增强 mosaic
- Common basic functions of Oracle
- Socket. Io long Connection Push, version Control, Real - Time Active user volume Statistics
- MySQL common functions
- Data middle office: construction ideas and implementation experience of data governance
- CAD secondary development +nettopologysuite+pgis reference multi version DLL
- socke.io长连接实现推送、版本控制、实时活跃用户量统计
- 异常处理(一)——空指针和数组索引越界
- ipvs 导致syn 重传问题
猜你喜欢

Drop down box for implementation

【Paper Reading-3D Detection】Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images

Data middle office: six questions data middle office

函数栈帧的创建和销毁

socke.io長連接實現推送、版本控制、實時活躍用戶量統計

若依实现下拉框

链表(一)——移除链表元素

What is webrtc?

Xcode13.3.1 项目执行pod install后报错

death_satan/hyperf-validate
随机推荐
RL practice (0) - and the platform xinchou winter season [rule based policy]
AutoCAD C polyline self intersection detection
Yygh-7-user management
YYGH-BUG-03
使用SSM框架,配置多个数据库连接
Differences between basic types and packaging classes
Difficulty calculation of Ethereum classic
death_satan/hyperf-validate
Ethereum Classic的难度计算|猿创征文
API learning of OpenGL (2006) glclientactivetexture
Example of MVVM framework based on kotlin+jetpack
Simple handwritten debounce function
SQL and list de duplication
idea创建类时自动添加注释
Install and manage multiple versions of PHP under mac
@Autowired注解为空的原因
Pre training model parameter mismatch
Taobao seo training video course [22 lectures]
What are the advantages of e-mail marketing? Why do sellers of shopline independent station attach so much importance to it?
Select trigger event from easyUI drop-down box